Skip to content

Latest commit

 

History

History
175 lines (123 loc) · 8.13 KB

README.md

File metadata and controls

175 lines (123 loc) · 8.13 KB

Options of Interest

This repo contains code accompaning the paper, Options of Interest: Temporal Abstraction with Interest Functions (AAAI 2020). It includes code for interest-option-critic (IOC) to run all the experiments described in the paper.

  • You can find demonstrative videos of the trained agents on our project webpage.
  • All proofs, pseudo-code and reproducibility checklist details are available in the appendix on our project webpage.
  • For experiment details, please refer to the full paper provided on the webpage.

Contents:

Tabular Experiments (Four-Rooms)

Dependencies

To install dependencies for control experiments: run the following commands:

conda create -n interest python=3.6
conda actvate interest
pip install seaborn
pip install matplotlib

Usage

To run the ioc code, use:

python interestoptioncritic_tabular_fr.py --baseline --discount=0.99 --epsilon=0.01 --noptions=4 --lr_critic=0.5 --lr_intra=0.25 --lr_term=0.25 --lr_interestfn=0.15 --nruns=10 --nsteps=2000 --nepisodes=500 --seed=7200

To run the baseline oc code, use:

python optioncritic_tabular_fr.py --baseline --discount=0.99 --epsilon=0.01 --noptions=4 --lr_critic=0.5 --lr_intra=0.25 --lr_term=0.25 --nruns=10 --nsteps=2000 --nepisodes=500 --seed=7200

Performance and Visualizations

To visualize the environment itself, use the notebook: fr_env_plots.ipynb

To plot the performance curves, use the notebook: fr_analysis_performance.ipynb

To visualize the options learned, use the notebook: fr_analysis_heatmaps.ipynb


Control Experiments (TMaze & HalfCheetah)

Dependencies

To install dependencies for control experiments: run the following commands:

conda create -n intfc python=3.6
conda actvate intfc
pip install tensorflow
pip install -e . (in the main directory)
pip install gym==0.9.3
pip install mujoco-py==0.5.1
brew install mpich
pip install mpi4py

Usage

To run the code with TMaze experiments, use: python run_mujoco.py --env TMaze --opt 2 --seed 2 --switch

To run the code with HalfCheetah experiments, use: python run_mujoco.py --env HalfCheetahDir-v1 --opt 2 --seed 2 --switch

Running experiments on slurm

To run the code on compute canada or any slurm cluster, make sure you have installed all dependencies and created a conda environment intf. Now, use the script launcher_mujoco.sh wherein you would need to add account and add username and then run:

chmod +x launcher_mujoco.sh
./launcher_mujoco.sh

To run the baseline option-critic, use the flag --nointfc in the above script:

k="xvfb-run -n "${port[$count]}" -s \"-screen 0 1024x768x24 -ac +extension GLX +render -noreset\" python run_mujoco.py --env "$envname" --saves --opt 2 --seed ${_seed} --mainlr ${_mainlr} --piolr ${_piolr} --switch --nointfc --wsaves"

Performance and Visualizations

To plot the learning curves, use the script: control/baselines/ppoc_int/plot_res.py with appropiate settings.

To load and run a trained agent, use:

python run_mujoco.py --env HalfCheetahDir-v1 --epoch 400 --seed 0

where epoch would be the training epoch at which you want to visualize the learned agent. This assumes that the saved model directory is in the ppoc_int folder.


Visual Navigation Experiments (Miniworld)

Dependencies

To install dependencies for miniworld experiments: run the following commands:

conda create -n intfc python=3.6
conda actvate intfc
pip install tensorflow
pip install -e . (in first directory of baselines)
brew install mpich
pip install mpi4py
pip install matplotlib
# to run the code with miniworld
pip install gym==0.10.5

To install miniworld: follow these installation instructions.

Since the cnn policy code is much slower than mujoco experiments, the optimal way to run is using a cluster. To run miniworld headless and training on a cluster, follow these instructions here.

Usage

To run the code headless for oneroom task with transfer, use:

xvfb-run -n 4005 -s "-screen 0 1024x768x24 -ac +extension GLX +render -noreset" python run_miniw.py --env MiniWorld-OneRoom-v0 --seed 5 --opt 2 --saves --mainlr 1e-4 --intlr 9e-5 --switch --wsaves

Running experiments on slurm

To run the code on compute canada or any slurm cluster, make sure you have installed all dependencies and created a conda environment intf. Now, use the script launcher_miniworld.sh wherein you would need to add account and add username and then run:

chmod +x launcher_miniworld.sh
./launcher_miniworld.sh

Please note that to ensure that miniworld code runs correctly headless, we here make sure we specify an exclusive port per run. If the port# overlaps for multiple jobs, the jobs will fail. Ideally there has to be a better way to do this, but this is the one we found easiest to make it work. Depending on how many jobs you want to launch (e.x. runs/seeds), set the range for port accordingly.

To run the baseline option-critic, use the flag --nointfc in the above script in the run command.

Performance and Visualizations

To plot the learning curves, use the script: miniworld/baselines/ppoc_int/plot_res.py with appropiate settings.

To visualize the trajectories of trained agents: make the following changes in your local installation of the miniworld environment code: https://github.com/kkhetarpal/gym-miniworld/commits/master Load and run the trained agent to visualize the trajectory of the trained agents with a 2-D top-view of the 3D oneroom.

To load and run a trained agent, use:

python run_miniw.py --env MiniWorld-OneRoom-v0 --epoch 480 --seed 0

where epoch would be the training epoch at which you want to visualize the learned agent. This assumes that the saved model directory is in the ppoc_int folder.

Contact

To ask questions or report issues, please open an issue on the issues tracker.

Additional Material

  • Poster presented at NeurIPS 2019, Deep RL Workshop, Learning Transferable Skills Workshop can be found (here).
  • Preliminary ideas presented in AAAI 2019, Student Abstract track, Selected as a finalist in 3MT Thesis Competition (paper link), (poster link).

Citations

  • The fourrooms experiment is built on the Option-Critic, 2017 tabular code.
  • The PPOC, 2017 baselines code serves as base to our function approximation experiments.
  • To install Mujoco, please visit their website and acquire a free student license.
  • For any issues you face with setting up miniworld, please visit their troubleshooting page.