Features/openai hacks #35

richardrl · 2019-02-21T08:51:38Z

Sorry for the delay...
These are the changes for successful pushing/pick and place in state space + typo in gaussian epsilon

I tried to cleanup the commit history with git rebase -i ec231e7 but it's behaving strangely; after I run that command, it pulls a much longer history of the commits, including some from the main branch.

…ork on pusher and pick and place (probably slider too but didnt test) 1. Online observation normalization and clipping 2. Adding additional loss term penalizing large preactivations in the policy network 3. Clipping Q-values in the Q network to stay within semantic return bounds

ok

vitchyr

LGTM! Would you mind making a few changes before I merge it in?

vitchyr · 2019-02-22T18:51:43Z

rlkit/exploration_strategies/gaussian_and_epsilon_strategy.py

@@ -4,7 +4,7 @@
 import numpy as np


-class GaussianAndEpislonStrategy(RawExplorationStrategy, Serializable):


Thanks for this fix!

vitchyr · 2019-02-22T18:52:39Z

rlkit/torch/networks.py

+        self.composite_normalizer = composite_normalizer
+
+    def forward(self, obs, **kwargs):
+        if self.composite_normalizer:


This check seems a bit redundant given the assert statement in __init__.

vitchyr · 2019-02-22T18:52:59Z

rlkit/torch/networks.py

+    def __init__(
+            self,
+            *args,
+            composite_normalizer: CompositeNormalizer = None,


Seems like we can just make this a required argument rather than kwarg.

vitchyr · 2019-02-22T18:54:11Z

rlkit/torch/td3/td3.py

+        if self.clip_q:
+            target_q_values = torch.clamp(
+                target_q_values,
+                -1/(1-self.discount),


Can you make this a parameter rather than hard-coding it? It could be something like:

if max_q_value is None: max_q_value = -1/(1-self.discount) # for HER sparse rewards.

in __init__.

…to features/openai-hacks

richardrl · 2019-03-04T10:16:55Z

I changed it to clip the networks instead, this should be easier.

richardrl and others added 15 commits January 13, 2019 08:08

refactor gass epislon strategy name

052a5c8

unboundlocalerror, fetchstack2

7969e31

fetchstack 2 sending successfully...

af79a23

removed data gitignore

3378fe2

changes

000c2e7

got gymfetchstack2 ec2 working

830daaf

environments

088dae1

Removing random files from tracked list...

51f9c17

Git rm cached environment folder

648e939

Rm more things

9245122

uhhh

cee2a86

ok

78150e8

squashing cleanup

050e86f

ok

Update .gitignore

97936cc

vitchyr reviewed Feb 22, 2019

View reviewed changes

richardrl added 2 commits March 4, 2019 02:12

Pull request changes

a768c46

Merge branch 'features/openai-hacks' of github.com:richardrl/rlkit in…

91c3e2e

…to features/openai-hacks

araffin mentioned this pull request May 6, 2019

[feature request] Implement goal-parameterized algorithms (HER) hill-a/stable-baselines#198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/openai hacks #35

Features/openai hacks #35

richardrl commented Feb 21, 2019 •

edited

Loading

vitchyr left a comment

vitchyr Feb 22, 2019

vitchyr Feb 22, 2019

vitchyr Feb 22, 2019

vitchyr Feb 22, 2019

richardrl commented Mar 4, 2019

		@@ -4,7 +4,7 @@
		import numpy as np


		class GaussianAndEpislonStrategy(RawExplorationStrategy, Serializable):

Features/openai hacks #35

Are you sure you want to change the base?

Features/openai hacks #35

Conversation

richardrl commented Feb 21, 2019 • edited Loading

vitchyr left a comment

Choose a reason for hiding this comment

vitchyr Feb 22, 2019

Choose a reason for hiding this comment

vitchyr Feb 22, 2019

Choose a reason for hiding this comment

vitchyr Feb 22, 2019

Choose a reason for hiding this comment

vitchyr Feb 22, 2019

Choose a reason for hiding this comment

richardrl commented Mar 4, 2019

richardrl commented Feb 21, 2019 •

edited

Loading