Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hindsight Experience Replay (HER) - Reloaded #273

Merged
merged 51 commits into from
Jun 4, 2019
Merged

Hindsight Experience Replay (HER) - Reloaded #273

merged 51 commits into from
Jun 4, 2019

Conversation

araffin
Copy link
Collaborator

@araffin araffin commented Apr 15, 2019

Refactored HER, starting from scratch. All previous code was removed.

Coverage bumps from 76% to 82% =)!

+ removed unused dependencies
+ update doc examples
+ add some additional exploration hacks for DDPG/SAC
+ update custom policy doc

TODO:

  • flipping bit test env
  • DQN support
  • SAC support
  • DDPG support
  • VecEnv support
  • Clean up
  • Documentation
  • 4 different goal selection strategies
  • performance check on highway-envs (see DDPG + HER - ParkingEnv-v0 Farama-Foundation/HighwayEnv#15)
  • perf check on robotics env (Mujoco)
  • support Discrete obs
  • find and fix bug with VecEnv

The repo for Mujoco experiments: https://github.com/araffin/her-experiments
now using rl baselines zoo (HER-support branch): https://github.com/araffin/rl-baselines-zoo/tree/HER-support

Mujoco Robotics envs

I would be confident about the implementation when we'll manage to solve FetchPush. One current drawback is that we do not support multiprocessing, which helps exploration.

From the paper of HER (OpenAI): 1 epoch = 1900 episodes = 1900 x 50 = 95 000 timesteps

  • FetchReach-v1: ok for DDPG and SAC (20k steps)
  • FetchPush-v1: ok for DDPG (8 workers, 2e6 steps), ok for SAC with early stopping (> 2e6 steps)

Closes #198
Closes #350

stable_baselines/her/her.py Outdated Show resolved Hide resolved
@araffin araffin mentioned this pull request Apr 24, 2019
+ improve comments
+ add properties to ReplayBuffer
@araffin araffin marked this pull request as ready for review April 28, 2019 14:56
@araffin
Copy link
Collaborator Author

araffin commented Apr 28, 2019

@hill-a @erniejunior @AdamGleave @ccolas The branch is now ready for review, I'm just waiting @ccolas results next week to ensure that it can solve Mujoco envs.

Known caveat: VecEnvWrapper are currently not supported but VecEnv are (I did not come up with an elegant fix yet)

@araffin
Copy link
Collaborator Author

araffin commented May 23, 2019

@hill-a @ccolas I just managed to make HER + SAC work on FetchReach, there seem to be a bug with VecEnv. Until I find the bug, I'll deactivate the VecEnv for SAC.
I'll push that soon to the rl zoo on HER-Support branch.

@araffin
Copy link
Collaborator Author

araffin commented May 25, 2019

Good news, I made HER + DDPG worked on my laptop (I had to tune the hyperparams so it does not blow up my RAM).
The main trick was to have several workers (4, i think it should work even better with more).
The only trick from OpenAI I did not include: the l2 penalty on the action.

So now, when I'll find and fix the bug with the VecEnvs, we can merge this branch!

@araffin
Copy link
Collaborator Author

araffin commented May 28, 2019

@hill-a @ccolas The performance check were successful on HER + DDPG (thanks to @keshaviyengar ), and HER + SAC (with a few trick to make it work with one worker).
Only the VecEnv bug remaining...

@araffin araffin added this to the v2.6.0 milestone May 30, 2019
@araffin
Copy link
Collaborator Author

araffin commented Jun 1, 2019

I found (and fixed) the bug! I was expecting keys in the wrong order.

@araffin
Copy link
Collaborator Author

araffin commented Jun 1, 2019

unfortunately, it looks like I introduced another bug...

@araffin
Copy link
Collaborator Author

araffin commented Jun 1, 2019

The bug may comes from gym...
If you check observation.spaces.keys() you get ['achieved_goal', 'desired_goal', 'observation']
but if you check env.reset().keys(), you get ['observation', 'achieved_goal', 'desired_goal']...

@araffin
Copy link
Collaborator Author

araffin commented Jun 2, 2019

@hill-a @erniejunior @AdamGleave @ccolas the PR is ready for final review.
Performance have been checked, bug fixed for VecEnv and support for Discrete obs added.

Copy link
Owner

@hill-a hill-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only slight change:

maybe add replay_wrapper=None to OffPolicyRLModel.learn?
https://github.com/hill-a/stable-baselines/blob/HER-2/stable_baselines/common/base_class.py#L705

@araffin
Copy link
Collaborator Author

araffin commented Jun 3, 2019

maybe add replay_wrapper=None to OffPolicyRLModel.learn?

Will do that ;)

@hill-a hill-a self-requested a review June 4, 2019 16:03
Copy link
Owner

@hill-a hill-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awsome work, love that high coverage percent! ^^
LGTM.

@araffin araffin merged commit fc9853c into master Jun 4, 2019
@araffin araffin deleted the HER-2 branch June 19, 2019 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trouble with custom policy [feature request] Implement goal-parameterized algorithms (HER)
4 participants