Based on PARL, the SAC algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Mujoco benchmarks.
Paper: SAC in Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Please see here to know more about Mujoco games.
- Each experiment was run three times with different seeds
- python3.5+
- parl>=2.0.0
- paddlepaddle>=2.0.0
- gym==0.9.1
- mujoco-py==0.5.7
# To train for HalfCheetah-v1(default),Hopper-v1,Walker2d-v1,Ant-v1
# --alpha 0.2(default)
python train.py --env [ENV_NAME]
# To reproduce the performance of Humanoid-v1
python train.py --env Humanoid-v1 --alpha 0.05