-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hindsight Experience Replay (HER) - Reloaded #273
Conversation
+ begin support for VecEnv
+ improve comments + add properties to ReplayBuffer
@hill-a @erniejunior @AdamGleave @ccolas The branch is now ready for review, I'm just waiting @ccolas results next week to ensure that it can solve Mujoco envs. Known caveat: VecEnvWrapper are currently not supported but VecEnv are (I did not come up with an elegant fix yet) |
Good news, I made HER + DDPG worked on my laptop (I had to tune the hyperparams so it does not blow up my RAM). So now, when I'll find and fix the bug with the VecEnvs, we can merge this branch! |
@hill-a @ccolas The performance check were successful on HER + DDPG (thanks to @keshaviyengar ), and HER + SAC (with a few trick to make it work with one worker). |
I found (and fixed) the bug! I was expecting keys in the wrong order. |
unfortunately, it looks like I introduced another bug... |
The bug may comes from gym... |
@hill-a @erniejunior @AdamGleave @ccolas the PR is ready for final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only slight change:
maybe add replay_wrapper=None
to OffPolicyRLModel.learn
?
https://github.com/hill-a/stable-baselines/blob/HER-2/stable_baselines/common/base_class.py#L705
Will do that ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awsome work, love that high coverage percent! ^^
LGTM.
Refactored HER, starting from scratch. All previous code was removed.
Coverage bumps from 76% to 82% =)!
+ removed unused dependencies
+ update doc examples
+ add some additional exploration hacks for DDPG/SAC
+ update custom policy doc
TODO:
The repo for Mujoco experiments:
https://github.com/araffin/her-experimentsnow using rl baselines zoo (HER-support branch): https://github.com/araffin/rl-baselines-zoo/tree/HER-support
Mujoco Robotics envs
I would be confident about the implementation when we'll manage to solve FetchPush. One current drawback is that we do not support multiprocessing, which helps exploration.
From the paper of HER (OpenAI): 1 epoch = 1900 episodes = 1900 x 50 = 95 000 timesteps
Closes #198
Closes #350