This is the repository hosting the code used for the paper Skew-Explore. The code contains the implementation of the Skew-Explore algorithm, 3 goal-proposing environments in Mujoco, RL algorithms including PPO, SAC, HER, and RND. The PointMaze environment is modified from the rllab, the RL algorithms are modified from the stable-baselines and RND is inspired by random-network-distillation.
For this experiment, we use Python 3.6
. Here's a guide to installing Python 3.6
in Ubuntu 16.04
For managing Python dependencies it is possible to use conda. Once you have cloned the repository, proceed to install the dependencies defined in the Pipfile
or 'environment.yml'
conda update conda
conda env create -f environment.yml
The "point_maze" environment requires rllab and MuJoCo version 131
- First clone rllab from HERE.
- Then download mujoco version 131 HERE.
- cd to the rllab repo and run "pip install ." to install it.
- Run "./scripts/" to setup MuJoCo environment.
- The "vendor" folder in rllab should contain two sub-folders "mujoco" and "mujoco_models".
- You may need to copy the "vendor" folder to "./anaconda3/envs/skew_explore/lib/python3.6/site-packages" if it cannot be found.
other dependencies:
pip install cached_property mako theano
The "yumi" and "yumi_door_button" environments are implemented using MuJoCo version 150. Please follow this link to install MuJoCo 150.
The code should be run with the stable-baselines from the submodule.
sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev
cd stable-baselines
pip install .
pip install gym
pip install tensorflow
To reproduce the result of the exploration experiments, please run the following commands:
- for point_maze environment
python --plot_coverage --alg her_sac --env maze --save_path maze_hersac_distrib --reward_type density --goal_bandwidth 2 --trajectory_bandwidth 0.5 --use_index --render
- for yumi_door environment
python --plot_coverage --alg her_sac --env yumi --save_path door_hersac_distrib --reward_type density --goal_bandwidth 1.5 --trajectory_bandwidth 0.05 --use_index --use_auto_scale --render
To reproduce the result of the sparse reward experiments, please run the following commands:
- for point_maze environment
python --plot_coverage --alg her_sac --env maze --save_path maze_hersac_distrib_sparse --reward_type density --goal_bandwidth 2 --trajectory_bandwidth 0.5 --use_index --use_extrinsic_reward --render
- for yumi_door environment
python --plot_coverage --alg her_sac --env yumi --save_path door_hersac_distrib_sparse --reward_type density --goal_bandwidth 1.5 --trajectory_bandwidth 0.05 --use_index --use_auto_scale --use_extrinsic_reward --render
- for yumi_door_button environment
python --plot_coverage --alg her_sac --env yumi_door_button --save_path doorbutton_hersac_distrib_sparse --reward_type density --goal_bandwidth 1.5 --trajectory_bandwidth 0.2 --use_index --use_auto_scale --use_extrinsic_reward --use_extrinsic_reward --history_buffer_size 5000000 --render