-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DDPG + HER - ParkingEnv-v0 #15
Comments
Hi Antonin, Here is a sample output, listing the default hyperparameters and training stats:
|
Ok, thanks this should be enough ;), and using how many workers? EDIT: it was one apparently (just saw that in the logs) |
Hi @eleurent , thanks for the hyperparameters, I got much better results now, even with SAC (training still in progress but looking much better than before, with a success training rate around 20% after 3e5 steps, which corresponds to a mean training episode reward of -5.8) EDIT: I updated the network architecture and it converges much faster now (train success rate of 13% in 5e4 steps) |
Update: I managed to reproduce your results using (on my dev branch import time
import gym
import highway_env
import numpy as np
from stable_baselines import HER, SAC, DDPG
from stable_baselines.ddpg import NormalActionNoise
env = gym.make("highway-parking-v0")
n_actions = env.action_space.shape[0]
noise_std = 0.2
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=noise_std * np.ones(n_actions))
n_sampled_goal = 4
# SAC hyperparams:
model = HER('MlpPolicy', env, SAC, n_sampled_goal=n_sampled_goal,
goal_selection_strategy='future',
verbose=1, buffer_size=int(1e6),
learning_rate=1e-3,
gamma=0.95, batch_size=256,
policy_kwargs=dict(layers=[256, 256, 256]))
# DDPG Hyperparams:
# NOTE: it works even without action noise
# model = HER('MlpPolicy', env, DDPG, n_sampled_goal=n_sampled_goal,
# goal_selection_strategy='future',
# verbose=1, buffer_size=int(1e6),
# actor_lr=1e-3, critic_lr=1e-3, action_noise=action_noise,
# gamma=0.95, batch_size=256,
# policy_kwargs=dict(layers=[256, 256, 256]))
model.learn(int(2e5))
model.save('sac_her_{}'.format(int(time.time()))) closing this issue then, thanks for the help. |
Hello,
I'm currently checking performance on ParkingEnv of a new HER implementation for stable-baselines (see hill-a/stable-baselines#273) and I was wondering what hyperparameters did you use for that environment?
Especially, how many steps, and what were ddpg hyperparams, her hyperparams (and which implementations)?
I'm also interested in knowing what was the best mean reward achieved in your experiment ;)
Currently, after 1e6 steps, with default hyperparams, normal noise of std 0.15, using 'future" goal selection strategy with k=4, I got a mean reward around 9.
The learned policy looks ok but not as good as your result.
PS: It seems that you are using a deprecated feature of gym, but I can open another issue for that
the warning:
The text was updated successfully, but these errors were encountered: