Skip to content
This repository has been archived by the owner on Aug 28, 2021. It is now read-only.

Latest commit

 

History

History
71 lines (41 loc) · 3 KB

README.md

File metadata and controls

71 lines (41 loc) · 3 KB

This repo contains code for our paper Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

The code base contains multiple branches.

  • The main branch contains experiments for the BabyAI tasks.
  • The mujoco branch contains experiments for the Mujoco tasks.
  • The carracing branch contains experiments for CarRacing task.

Based on code base for the BabyAI project at Mila. https://github.com/mila-iqia/babyai

Follow similar installations as in https://github.com/mila-iqia/babyai.

Requirements:

Installation

Requirements:

  • Python 3.5+
  • OpenAI Gym
  • NumPy
  • PyQT5
  • PyTorch 0.4.1+

Start by manually installing PyTorch. See the PyTorch website for installation instructions specific to your platform.

Then, clone this repository and install the other dependencies with pip3:

git clone https://github.com/facebookresearch/modeling_long_term_future.git
cd modeling_long_term_future
pip3 install --editable .

Create a new conda env using env.yml in the repo

Training teacher

We use the BabyAI Pickup-Unlock game.

First train the teacher (for imitation learning) using PPO with curriculum learning. Start with a room size of 6 and then work our way up to room size of 15.

python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 6
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 8 --model MODEL_ROOM6_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 10 --model MODEL_ROOM8_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 12 --model MODEL_ROOM10_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 15 --model MODEL_ROOM12_PRETRAINED

Generate expert trajectories

Generate expert trajectories from the experts trained using curriculum learning

mnkdir data
python3 -m scripts.gen_samples --episodes 10000 --env BabyAI-UnlockPickup-v0 --model pretrained_model_room_10 --room 10

Training the student to imitate the expert

To run our model

python3 -m scripts.zforcing_main_state_dec --env BabyAI-UnlockPickup-v0 --datafile EXPERT_DATA_TO_LOAD --model MODEL_NAME --eval-episodes 100 --eval-interval 200  --bwd-weight 0.0 --lr 1e-4 --aux-weight-start 0.0001 --bwd-l2-weight 1. --kld-weight-start 0.2  --aux-weight-end 0.0001  --room 10

To run the baseline

python3 -m scripts.zforcing_main_state_dec --datafile EXPERT_DATA_TO_LOAD --env BabyAI-UnlockPickup-v0 --model MODEL_NAME --eval-episodes 100 --eval-interval 200  --bwd-weight 0.0 --lr 1e-4 --aux-weight-start 0.000 --aux-weight-end 0.0 --room 10

License

Find license in LICENSE file.