A Gym environment for Bennet Foddy's game called QWOP.
Give it a try and see why it's such a good candidate for Reinforcement Learning :)
You should also check this video for a demo.
- A call to
.step()
advances exactly N game frames (configurable) - Option to disable WebGL rendering for improved performance
- Is fully deterministic *
- State extraction for a slim observation of 60 bytes
- Real-time visualization of various game stats (optional)
- Additional in-game controls for easier debugging
* given the state includes the steps since last hard reset, see ♻️ Resetting
- Install Python 3.10 or higher
- Install a chrome-based web browser (Google Chrome, Brave, Chromium, etc.)
- Download chromedriver 116.0 or higher
- Install the
qwop-gym
package and patch QWOP.min.js from your terminal:
pip install qwop-gym
# Fetch & patch QWOP source code
curl -sL https://www.foddy.net/QWOP.min.js | qwop-gym patch
Create an instance in your code:
import qwop_gym
env = gym.make("QWOP-v1", browser="/browser/path", driver="/driver/path")
The qwop-gym
executable is a handy command-line tool which makes it easy to
play, record and replay episodes, train agents and more.
Firstly, perform the initial setup:
qwop-gym bootstrap
Play the game (use Q, W, O, P keys):
qwop-gym play
Explore the other available commands:
$ qwop-gym -h
usage: qwop-gym [options] <action>
options:
-h, --help show this help message and exit
-c FILE config file, defaults to config/<action>.yml
action:
play play QWOP, optionally recording actions
replay replay recorded game actions
train_bc train using Behavioral Cloning (BC)
train_gail train using Generative Adversarial Imitation Learning (GAIL)
train_airl train using Adversarial Inverse Reinforcement Learning (AIRL)
train_ppo train using Proximal Policy Optimization (PPO)
train_dqn train using Deep Q Network (DQN)
train_qrdqn train using Quantile Regression DQN (QRDQN)
spectate watch a trained model play QWOP, optionally recording actions
benchmark evaluate the actions/s achievable with this env
bootstrap perform initial setup
patch apply patch to original QWOP.min.js code
help print this help message
examples:
qwop-gym play
qwop-gym -c config/record.yml play
For example, to train a PPO agent, edit config/ppo.yml
and run:
python qwop-gym train_ppo
Warning
Although no rendering occurs during training, the browser window must remain open as the game is actually running at very high speeds behind the curtains.
Visualize tensorboard graphs:
tensorboard --logdir data/
Configure model_file
in config/spectate.yml
and watch your trained agent play the game:
python qwop-gym spectate
Note
Imitation learning is powered by the
imitation
library, which
depends on the deprecated gym
library which makes it incompatible with
QwopEnv. This can be resolved as soon as imitation
introduces support for
gymnasium
. As a workaround, you can checkout the qwop-gym
project
locally and use the gym-compat
branch instead.
# In this branch, QwopEnv works with the deprecated `gym` library
git checkout gym-compat
# Note that python-3.10 is required, see notes in requirements.txt
pip install .
# Patch the game again as this branch works with different paths
curl -sL https://www.foddy.net/QWOP.min.js | python -m src.game.patcher
For imitation learning, first record some of your own games:
python qwop-gym.py play -c config/record.yml
Train an imitator via Behavioral Cloning:
python qwop-gym.py train_bc
If you are a fan of W&B, you can
use the provided configs in config/wandb/
and create your own sweeps.
wandb
is a rather bulky dependency and is not installed by default. Install
it with pip install wandb
before proceeding with the below examples.
# create a new W&B sweep
wandb sweep config/wandb/qrdqn.yml
# start a new W&B agent
wandb agent <username>/qwop/<sweep>
You can check out my W&B public QWOP project here. There you can find pre-trained model artifacts (zip files) of some well-performing agents, as well as see how they compare to each other. This youtube video showcases some of them.
Info about the Gym env can be found here
Details about the QWOP game can be found here
- https://github.com/Wesleyliao/QWOP-RL
- https://github.com/drakesvoboda/RL-QWOP
- https://github.com/juanto121/qwop-ai
- https://github.com/ShawnHymel/qwop-ai
In comparison, qwop-gym offers several key features:
- the env is performant - perfect for on-policy algorithms as observations can be collected at great speeds (more than 2000 observations/sec on an Apple M2 CPU - orders of magnitute faster than the other QWOP RL envs).
- the env is deterministic - there are no race conditions and randomness can be removed if desired. Replaying recorded actions produces the same result.
- the env has a simple reward model and compared to other QWOP envs, it is less biased, eg. no special logic for stuff like knee bending, low torso height, vertical movement, etc.
- the env allows all possible key combinations (15), other QWOP envs usually allow only the "useful" 8 key combinations.
- great results (fast, human-like running) achieved by RL agents trained entirely through self-play, without pre-recorded expert demonstrations
- qwop-gym already contains scripts for training with 8 different algorithms and adding more to the list is simple - this makes it suitable for exploring and/or benchmarking a variety of RL algorithms.
- qwop-gym uses reliable open-source implementations of RL algorithms in contrast to many other projects using "roll-your-own" implementations.
- QWOP's original JS source code is barely modified: 99% of all extra functionality is designed as a plugin, bundled separately and only a "diff" of QWOP.min.js is published here (in respect to Benett Foddy's kind request to refrain from publishing the QWOP source code as part of is not open-source).
The below list highlights some areas in which the project could use some improvements:
- the OS may put some pretty rough restrictions on the web browser's rendering as soon as it's put in the background (on OS X at least). Ideally, the browser should run in a headless mode, but I couldn't find a headless browser that can support WebGL.
gym
is deprecated since October 2022, but theimitation
library still does not officially supportgymnasium
. As soon as that is addressed, there will no longer be required to use the specialgym-compat
branch here for imitation learning.wandb
uses a monkey-patch for collecting tensorboard logs which does not work well with GAIL/AIRL/BC (and possibly other algos fromimitation
). As a result, graphs in wandb have weird names. This is mostly an issue withwandb
and/orimitation
libraries, however there could be a way to work around this here.- firefox browser and geckodriver are not supported as an alternative browser/driver pair, but adding support for them should be fairly easy
Here is a simple guide to follow if you want to contribute to this project:
- Find an existing issue to work on or submit a new issue which you're also going to fix. Make sure to notify that you're working on a fix for the issue you picked.
- Branch out from latest
main
. - Make sure you have formatted your code with the black formatter.
- Commit and push your changes in your branch.
- Submit a PR.