-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run BREMEN on D4RL #5
Comments
@IcarusWizard import d4rl
def generate_d4rl_data(self, dataset_name='hopper-medium-v0', n_train=int(1e6), horizon=1000):
print(dataset_name)
dataset = d4rl.qlearning_dataset(gym.make(dataset_name).env)
# datafile: str
s1 = dataset['observations']
s2 = dataset['next_observations']
a1 = dataset['actions']
r = dataset['rewards']
data_size = max(s1.shape[0], s2.shape[0], a1.shape[0], r.shape[0])
n_train = min(n_train, data_size)
paths = []
for i in range(int(n_train/horizon)):
path = Path()
if i*horizon % 10000 == 0:
print(i*horizon)
for j in range(i*horizon, (i+1)*horizon, 1):
obs = s1[j].tolist()
action = a1[j].tolist()
next_obs = s2[j].tolist()
reward = r[j].tolist()
path.add_timestep(obs, action, next_obs, reward)
paths.append(path)
return paths and replace a part of code as follows in def get_data_from_offline_batch(params, env, normalization_scope=None, model='dynamics', split_ratio=0.9):
train_collection = DataCollection(
batch_size=params[model]['batch_size'],
max_size=params['max_train_data'], shuffle=True)
val_collection = DataCollection(batch_size=params[model]['batch_size'],
max_size=params['max_val_data'],
shuffle=False)
rollout_sampler = RolloutSampler(env)
# rl_paths = rollout_sampler.generate_offline_data(
# data_file=params['data_file'],
# n_train=params["n_train"]
# )
rl_paths = rollout_sampler.generate_d4rl_data(
dataset_name=params['data_file'],
n_train=params["n_train"]
)
path_collection = PathCollection()
obs_dim = env.observation_space.shape[0]
normalization = add_path_data_to_collection_and_update_normalization(
rl_paths, path_collection,
train_collection, val_collection,
normalization=None,
split_ratio=split_ratio,
obs_dim=obs_dim,
normalization_scope=normalization_scope)
return train_collection, val_collection, normalization, path_collection, rollout_sampler You also need to add the Because D4RL is an additional experiment, the source code is quite dirty. I hope this part of the code would help you. |
Hi, @frt03 . Thanks for your help. I have got it to work. There are additional changes to be made. D4RL requires the latest version of |
I have an additional question with respect to the performance. I have run the code on Moreover, I notice that the test is performed at each iteration with only 3000 steps, which may not enough to evaluate the performance on hopper and walker2d. |
Hi. Thanks for sharing the code. I am interested in offline reinforcement learning. In Appendix D. of the paper, you show the performance of BREMEN on D4RL, but the launch script is not found in the codebase. Do you have a plan to share the script to launch d4rl experiments?
The text was updated successfully, but these errors were encountered: