This is a pytorch implementation of paper Latent-variable advantage-weighted policy optimization for offline reinforcement learning (LAPO) on D4RL dataset.
- python=3.7.11
- Datasets for Deep Data-Driven Reinforcement Learning (D4RL)
- torch=1.10.0
Maze2d: maze2d-umaze/medium/large-v1
$ python main_d4rl.py --env_name maze2d-umaze-v1 --kl_beta 0.3 --plot
Antmaze: antmaze-umaze/medium/large-diverse-v1
$ python main_d4rl.py --env_name antmaze-umaze-diverse-v1 --doubleq_min 0.7 --plot
Mujoco locomotion: hopper/walker2d/halfcheetah-random/medium/expert-v2
$ python main_d4rl.py --env_name hopper-random-v2
Kitchen: kitchen-complete/partial/mixed-v0
$ python main_d4rl.py --env_name kitchen-complete-v0
You will get following results using --seed: 123(red) 456(green) 789(blue)
If you find this code useful, please cite our paper:
@article{chen2022latent,
title={Latent-Variable Advantage-Weighted Policy Optimization for Offline RL},
author={Chen, Xi and Ghadirzadeh, Ali and Yu, Tianhe and Gao, Yuan and Wang, Jianhao and Li, Wenzhe and Liang, Bin and Finn, Chelsea and Zhang, Chongjie},
journal={arXiv preprint arXiv:2203.08949},
year={2022}
}
- If you have any questions, please contact me: pcchenxi@gmail.com