Hindsight Experience Replay (HER) - Reloaded #273

araffin · 2019-04-15T11:38:09Z

Refactored HER, starting from scratch. All previous code was removed.

Coverage bumps from 76% to 82% =)!

+ removed unused dependencies
+ update doc examples
+ add some additional exploration hacks for DDPG/SAC
+ update custom policy doc

TODO:

The repo for Mujoco experiments: ~~https://github.com/araffin/her-experiments~~
now using rl baselines zoo (HER-support branch): https://github.com/araffin/rl-baselines-zoo/tree/HER-support

Mujoco Robotics envs

I would be confident about the implementation when we'll manage to solve FetchPush. One current drawback is that we do not support multiprocessing, which helps exploration.

From the paper of HER (OpenAI): 1 epoch = 1900 episodes = 1900 x 50 = 95 000 timesteps

FetchReach-v1: ok for DDPG and SAC (20k steps)
FetchPush-v1: ok for DDPG (8 workers, 2e6 steps), ok for SAC with early stopping (> 2e6 steps)

Closes #198
Closes #350

+ add comments

+ begin support for VecEnv

stable_baselines/her/her.py

+ improve comments + add properties to ReplayBuffer

araffin · 2019-04-28T14:58:10Z

@hill-a @erniejunior @AdamGleave @ccolas The branch is now ready for review, I'm just waiting @ccolas results next week to ensure that it can solve Mujoco envs.

Known caveat: VecEnvWrapper are currently not supported but VecEnv are (I did not come up with an elegant fix yet)

araffin · 2019-05-23T21:36:17Z

@hill-a @ccolas I just managed to make HER + SAC work on FetchReach, there seem to be a bug with VecEnv. Until I find the bug, I'll deactivate the VecEnv for SAC.
I'll push that soon to the rl zoo on HER-Support branch.

araffin · 2019-05-25T19:30:22Z

Good news, I made HER + DDPG worked on my laptop (I had to tune the hyperparams so it does not blow up my RAM).
The main trick was to have several workers (4, i think it should work even better with more).
The only trick from OpenAI I did not include: the l2 penalty on the action.

So now, when I'll find and fix the bug with the VecEnvs, we can merge this branch!

araffin · 2019-05-28T16:08:23Z

@hill-a @ccolas The performance check were successful on HER + DDPG (thanks to @keshaviyengar ), and HER + SAC (with a few trick to make it work with one worker).
Only the VecEnv bug remaining...

araffin · 2019-06-01T11:37:16Z

I found (and fixed) the bug! I was expecting keys in the wrong order.

araffin · 2019-06-01T12:38:36Z

unfortunately, it looks like I introduced another bug...

araffin · 2019-06-01T14:17:08Z

The bug may comes from gym...
If you check observation.spaces.keys() you get ['achieved_goal', 'desired_goal', 'observation']
but if you check env.reset().keys(), you get ['observation', 'achieved_goal', 'desired_goal']...

araffin · 2019-06-02T19:24:04Z

@hill-a @erniejunior @AdamGleave @ccolas the PR is ready for final review.
Performance have been checked, bug fixed for VecEnv and support for Discrete obs added.

hill-a

Only slight change:

maybe add replay_wrapper=None to OffPolicyRLModel.learn?
https://github.com/hill-a/stable-baselines/blob/HER-2/stable_baselines/common/base_class.py#L705

araffin · 2019-06-03T09:16:22Z

maybe add replay_wrapper=None to OffPolicyRLModel.learn?

Will do that ;)

hill-a

Awsome work, love that high coverage percent! ^^
LGTM.

araffin added 3 commits April 11, 2019 13:31

Add bit flipping env

a615b2a

HER reloaded (WIP)

5bfa61c

DQN + HER

7ff5208

araffin mentioned this pull request Apr 16, 2019

Regulate temperature in room(overfitting, continuous action space) [question] #277

Closed

araffin added 2 commits April 16, 2019 22:38

Add support for SAC and DDPG

3e67330

Add tests for SAC and DDPG + HER

dab5647

+ add comments

araffin mentioned this pull request Apr 20, 2019

SAC Hyperparameters MountainCarContinuous-v0 - Env with deceptive reward rail-berkeley/softlearning#76

Open

araffin added 4 commits April 20, 2019 16:39

Bug fix + add comments

9e42f1e

Add action noise for SAC

63ffc83

Add note about pop-art normalization

2e79261

Merge branch 'master' into HER-2

12ab42e

araffin mentioned this pull request Apr 21, 2019

DDPG + HER - ParkingEnv-v0 Farama-Foundation/HighwayEnv#15

Closed

araffin added 3 commits April 22, 2019 11:16

Merge branch 'master' into HER-2

eb0da05

Add saving/loading

a9f43af

+ begin support for VecEnv

Add success rate

ca32a5f

hill-a reviewed Apr 23, 2019

View reviewed changes

stable_baselines/her/her.py Outdated Show resolved Hide resolved

araffin added 2 commits April 23, 2019 20:48

Fix HER learning method

8023bbc

Merge branch 'master' into HER-2

abe17f3

araffin mentioned this pull request Apr 24, 2019

Fixed HER #288

Closed

araffin added 3 commits April 27, 2019 23:32

Add support for VecEnv

09e514d

+ improve comments + add properties to ReplayBuffer

Update documentation

c6479e4

Add HER example

c72e760

araffin marked this pull request as ready for review April 28, 2019 14:56

araffin requested review from AdamGleave, ccolas and ernestum April 28, 2019 14:56

araffin added 3 commits April 28, 2019 17:45

Merge branch 'master' into HER-2

fc3d592

Merge branch 'master' into HER-2

20fda69

Merge branch 'master' into HER-2

36fd201

araffin added 2 commits May 23, 2019 23:45

Remove GoalEnvNormalize

7592bbd

Merge branch 'master' into HER-2

aebdfe9

Merge branch 'master' into HER-2

edfe3c3

araffin added this to the v2.6.0 milestone May 30, 2019

araffin added 2 commits June 1, 2019 00:15

Fix typo

730b171

Bug fix for HER + VecEnv

635c7d0

Fix HER test env

bf363ad

araffin added 3 commits June 1, 2019 16:45

Fixed key order

ccbc5c7

Add support for discrete obs space

e1e344b

Update doc about reproducing experiments

096f045

araffin added 4 commits June 2, 2019 21:32

Update doc: DDPG supports multiprocessing with MPI

7688838

Merge branch 'master' into HER-2

5c24590

Fix for new abstract method

cd18225

Update changelog

65ef631

hill-a requested changes Jun 3, 2019

View reviewed changes

araffin added 3 commits June 4, 2019 09:14

Fix custom policy example

84af166

Add replay_wrapper to base OffPolicy class

e2408eb

Fix reimport

6ed497d

hill-a self-requested a review June 4, 2019 16:03

hill-a approved these changes Jun 4, 2019

View reviewed changes

araffin merged commit fc9853c into master Jun 4, 2019

araffin deleted the HER-2 branch June 19, 2019 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hindsight Experience Replay (HER) - Reloaded #273

Hindsight Experience Replay (HER) - Reloaded #273

araffin commented Apr 15, 2019 •

edited

Loading

araffin commented Apr 28, 2019

araffin commented May 23, 2019

araffin commented May 25, 2019

araffin commented May 28, 2019

araffin commented Jun 1, 2019 •

edited

Loading

araffin commented Jun 1, 2019

araffin commented Jun 1, 2019 •

edited

Loading

araffin commented Jun 2, 2019

hill-a left a comment •

edited

Loading

araffin commented Jun 3, 2019

hill-a left a comment

Hindsight Experience Replay (HER) - Reloaded #273

Hindsight Experience Replay (HER) - Reloaded #273

Conversation

araffin commented Apr 15, 2019 • edited Loading

TODO:

Mujoco Robotics envs

araffin commented Apr 28, 2019

araffin commented May 23, 2019

araffin commented May 25, 2019

araffin commented May 28, 2019

araffin commented Jun 1, 2019 • edited Loading

araffin commented Jun 1, 2019

araffin commented Jun 1, 2019 • edited Loading

araffin commented Jun 2, 2019

hill-a left a comment • edited Loading

Choose a reason for hiding this comment

araffin commented Jun 3, 2019

hill-a left a comment

Choose a reason for hiding this comment

araffin commented Apr 15, 2019 •

edited

Loading

araffin commented Jun 1, 2019 •

edited

Loading

araffin commented Jun 1, 2019 •

edited

Loading

hill-a left a comment •

edited

Loading