Skip to content
/ PPO Public

Implementation of Proximal Policy Optimization (PPO)

Notifications You must be signed in to change notification settings

minoring/PPO

Repository files navigation

Proximal Policy Optimization Algorithms

Proximal Policy Optimization (PPO) is a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. (Schulman et al. 2017)

Training

You can check parse_utils.py to examine available flags and config_ppo.yaml, config_game.yaml for hyperparameters.

Run python train.py --env <Gym env> --hyperparams <hypr> --seed <random_seed> --surrogate-objective <surrogate_objective>

e.g. python train.py --env HalfCheetah-v2 --hyperparams mujoco --seed 1 --log-interval 1 --surrogate-objective clipping

Testing

Run python test.py --env <Gym env> --trained-model <path/to/trained/model> --record-video

e.g. python test.py --env HalfCheetah-v2 --trained-model HalfCheetah-v2.pt --record-video

or without saving a video of last episode python test.py --env HalfCheetah-v2 --trained-model HalfCheetah-v2.pt

Results

Comparison of Surrogate Objectives

We compare the surrogate objectives by plotting learning curve from 7 environments, 3 random seed.

Surrogate Objectives

Learning Curve

Effect of the Entropy

10 Runs of evaluative episode.

After training 1M timesteps of PPO (clipping), we compute the average return of 10 evaluative episodes from 7 environments.

Env Average Return (std)
HalfCheetah-v2 1316.80 (80.25)
Hopper-v2 1915.84 (544.53)
InvertedDoublePendulum-v2 1276.67 (68.95)
InvertedPendulum-v2 782.9 (343.65)
Reacher-v2 -12.45 (2.56)
Swimmer-v2 -38.20 (3.81)
Walker2d-v2 1401.74 (476.82)

Videos

HalfCheetah-v2 Hopper-v2 InvertedPendulum-v2
Reacher-v2 Swimmer-v2 Walker2d-v2

References

Paper

Docs

OpenAI Spinning Up

About

Implementation of Proximal Policy Optimization (PPO)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages