This repo only servers as a link to Tianshou's benchmark of Mujoco environments. Latest benchmark is maintained under thu-ml/tianshou. See full benchmark here.
Keywords: deep reinforcement learning, pytorch, mujoco, benchmark, performances, Tianshou, baseline
We benchmarked Tianshou algorithm implementations in 9 out of 13 environments from the MuJoCo Gym task suite.
For each supported algorithm and supported mujoco environments, we provide:
- Default hyperparameters used for benchmark and scripts to reproduce the benchmark;
- A comparison of performance (or code level details) with other open source implementations or classic papers;
- Graphs and raw data that can be used for research purposes;
- Log details obtained during training;
- Pretrained agents;
- Some hints on how to tune the algorithm.
Supported algorithms are listed below:
- Deep Deterministic Policy Gradient (DDPG), commit id
- Twin Delayed DDPG (TD3), commit id
- Soft Actor-Critic (SAC), commit id
- REINFORCE algorithm, commit id
- Natural Policy Gradient (NPG), commit id
- Advantage Actor-Critic (A2C), commit id
- Proximal Policy Optimization (PPO), commit id
- Trust Region Policy Optimization (TRPO), commit id
- Trust Region Policy Optimization (ACKTR), commit id
Environment | Tianshou | SpinningUp (Pytorch) | SAC paper |
---|---|---|---|
Ant | 5850.2±475.7 | ~3980 | ~3720 |
HalfCheetah | 12138.8±1049.3 | ~11520 | ~10400 |
Hopper | 3542.2±51.5 | ~3150 | ~3370 |
Walker2d | 5007.0±251.5 | ~4250 | ~3740 |
Swimmer | 44.4±0.5 | ~41.7 | N |
Humanoid | 5488.5±81.2 | N | ~5200 |
Reacher | -2.6±0.2 | N | N |
InvertedPendulum | 1000.0±0.0 | N | N |
InvertedDoublePendulum | 9359.5±0.4 | N | N |
- Spinningup Benchmark
- OpenAI Baseliens Benchmark
- TODO and relative discussions: 1, 2