Proximal Policy Optimization Algorithms

Proximal Policy Optimization (PPO) is a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. (Schulman et al. 2017)

Training

You can check parse_utils.py to examine available flags and config_ppo.yaml, config_game.yaml for hyperparameters.

Run python train.py --env <Gym env> --hyperparams <hypr> --seed <random_seed> --surrogate-objective <surrogate_objective>

e.g. python train.py --env HalfCheetah-v2 --hyperparams mujoco --seed 1 --log-interval 1 --surrogate-objective clipping

Testing

Run python test.py --env <Gym env> --trained-model <path/to/trained/model> --record-video

e.g. python test.py --env HalfCheetah-v2 --trained-model HalfCheetah-v2.pt --record-video

or without saving a video of last episode python test.py --env HalfCheetah-v2 --trained-model HalfCheetah-v2.pt

Results

Comparison of Surrogate Objectives

We compare the surrogate objectives by plotting learning curve from 7 environments, 3 random seed.

Surrogate Objectives

Learning Curve

Effect of the Entropy

10 Runs of evaluative episode.

After training 1M timesteps of PPO (clipping), we compute the average return of 10 evaluative episodes from 7 environments.

Env	Average Return (std)
HalfCheetah-v2	1316.80 (80.25)
Hopper-v2	1915.84 (544.53)
InvertedDoublePendulum-v2	1276.67 (68.95)
InvertedPendulum-v2	782.9 (343.65)
Reacher-v2	-12.45 (2.56)
Swimmer-v2	-38.20 (3.81)
Walker2d-v2	1401.74 (476.82)

Videos

HalfCheetah-v2	Hopper-v2	InvertedPendulum-v2

Reacher-v2	Swimmer-v2	Walker2d-v2

References

Paper

Proximal Policy Optimization Algorithms, Schulman et al. 2017
High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al. 2016

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
log		log
model		model
pretrained_model		pretrained_model
scripts		scripts
.gitignore		.gitignore
README.md		README.md
config_ppo.yaml		config_ppo.yaml
env.py		env.py
logger.py		logger.py
parallel_env.py		parallel_env.py
parse_utils.py		parse_utils.py
plot_entropy.py		plot_entropy.py
plot_learning_curve.py		plot_learning_curve.py
pytorch_utils.py		pytorch_utils.py
test.py		test.py
train.py		train.py
transition_memory.py		transition_memory.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization Algorithms

Training

Testing

Results

Comparison of Surrogate Objectives

Effect of the Entropy

10 Runs of evaluative episode.

Videos

References

Paper

Docs

OpenAI Spinning Up

About

Releases

Packages

Languages

minoring/PPO

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization Algorithms

Training

Testing

Results

Comparison of Surrogate Objectives

Effect of the Entropy

10 Runs of evaluative episode.

Videos

References

Paper

Docs

OpenAI Spinning Up

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages