Proximal Policy Optimization (PPO)

This repository provides a working and clean code of the PPO algorithm using JAX and Haiku. To see it working, you can simply click on the Colab link above!


Our PPO agent trained on inverted pendulum.	Average episodic return over the rollouts. More is available by typing "tensorboard --logdir results/inv-pend/".

Our PPO agent trained on reacher.	Average episodic return over the rollouts. More is available by typing "tensorboard --logdir results/reacher/".

Our PPO agent trained on pendulum.	Average episodic return over the rollouts. More is available by typing "tensorboard --logdir results/pendulum/".

Interested readers can have a look to our report that goes deeper into the details.

Environments

Agents

random_agent: a random agent..
vanilla_ppo: our implementation of PPO.

Tricks

Networks

Separated value and policy networks.
The standard deviation of the policy can be predicted by the policy network or fixed to a given value. softplus activation for making the std always positive.
Orthogonal initialization of the weights and constant initialization for the biases.
Activation functions are tanh.

Training

Linear annealing of the learning rates. Different learning rate for the policy and value networks.
Learning with minibatches. Normalized advantages at minibatch level.

Loss

Using Generalized Advantage Estimation (GAE).
Clipped ratio
Minimum between ratio x GAE and clipped_ratio x GAE
Clipped gradient norm

Environment wrappers

Normalization and clipping of the observation
Normalization and clipping of the rewards
Action normalization: the agent can predict actions between $[-1, 1]$, and the wrapper scale them back to the environment action range.

How to run it

The training loop is implemented in the ppo notebook. It contains instances of the agents tuned for each of the environments. We log the training metrics (losses, actions, rewards, etc) to a Tensorboard file, you can monitor it separately or within the notebook. After training is completed, a video of the agent is generated.

Fast and easy

Just click on this

Run it locally

First you need to clone the repository. For that, you can use the following command line:

git clone git@github.com:emasquil/ppo.git

Then we recommend using a virtual environment, this can be done by the following:

python3 -m venv env
source env/bin/activate

Finally, in order to install the package, you can simply run:

pip install -e .

If you are planning on developing the package you will need to add [dev] at the end. This gives:

pip install -e .[dev]

This package uses MuJoCo environments, please install it by following these instructions.

Note that you might need to install the following.

sudo apt-get install -y xvfb ffmpeg freeglut3-dev libosmesa6-dev patchelf libglew-dev

After all the installs you should be ready to run the notebook locally.

Results

In the results directory you can find some plots, logs, and videos of the agents after being trained on the environments previously mentioned.

Contributing

Before any pull request, please make sure to format your code using the following:

black -l 120 ./

Inspirations

vwxyzjn/cleanrl
openai/baselines
DLR-RM/stable-baselines3
openai/spinningup
Costa Huang's blogpost
deepmind/acme

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
gifs_and_pngs		gifs_and_pngs
ppo		ppo
results		results
test/replay_buffers		test/replay_buffers
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
ppo.ipynb		ppo.ipynb
report.pdf		report.pdf
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization (PPO)

Contents

Environments

Agents

Tricks

Networks

Training

Loss

Environment wrappers

How to run it

Fast and easy

Run it locally

Results

Contributing

Inspirations

About

Releases

Packages

Contributors 4

Languages

License

emasquil/ppo

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization (PPO)

Contents

Environments

Agents

Tricks

Networks

Training

Loss

Environment wrappers

How to run it

Fast and easy

Run it locally

Results

Contributing

Inspirations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages