Skip to content

Latest commit

 

History

History
113 lines (86 loc) · 4.64 KB

README.md

File metadata and controls

113 lines (86 loc) · 4.64 KB

RIME: Robust Preference-based Reinforcement Learning
with Noisy Preferences

Jie Cheng1,2 ,  Gang Xiong1,2 ,  Xingyuan Dai1,2 ,  Qinghai Miao2 ,  Yisheng Lv1,2   Fei-Yue Wang1,2  
1State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA   
2School of Artificial Intelligence, the University of Chinese Academy of Sciences   

ICML 2024 Spotlight

[Paper]       [Code]

Requirements

Install MuJoCo 2.1

sudo apt update
sudo apt install -y unzip gcc libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf libegl1 libopengl0
sudo ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so
mkdir ~/.mujoco
cd ~/.mujoco
wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz
tar -zxvf mujoco210-linux-x86_64.tar.gz
rm -f mujoco210-linux-x86_64.tar.gz

Include the following lines in the ~/.bashrc file:

export LD_LIBRARY_PATH=~/.mujoco/mujoco210/bin
export PATH="$LD_LIBRARY_PATH:$PATH"

Then run source ~/.bashrc

Install dependencies

conda env create -f conda_env.yaml
conda activate rime
pip install -e .[docs,tests,extra]
cd custom_dmc2gym
pip install -e .
pip install git+https://github.com/rlworkgroup/metaworld.git@04be337a12305e393c0caf0cbf5ec7755c7c8feb
pip install torch==1.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html

You could run python -c "import mujoco_py; print(mujoco_py.__version__)" to check if mujoco-py is installed properly. If not, see FAQ.

Get Started

Configs

Set hyperparameters in the all-in-one script run_parallel.sh, including the name of algorithm, hyperparameters of the algorithm and env, index of GPU for each random seed, etc.

Running

For simulated (scripted) teachers:

bash run_parallel.sh

This will enable multi-threading to run experiments with multiple random seeds simultaneously.

For real human teachers (requires online annotation):

bash run_human_labeller.sh

When entering the annotation phase, run label_program.ipynb to annotate human preferences. The experimental result of RIME annotated by non-robotics students (detailed in Section 5.3) can be seen in this GIF.

Acknowledgement

This repo benefits from BPref, SURF, RUNE, and MRN. Thanks for their wonderful work.

Citation

@InProceedings{cheng2024rime,
  title = 	 {{RIME}: Robust Preference-based Reinforcement Learning with Noisy Preferences},
  author =       {Cheng, Jie and Xiong, Gang and Dai, Xingyuan and Miao, Qinghai and Lv, Yisheng and Wang, Fei-Yue},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {8229--8247},
  year = 	 {2024},
  volume = 	 {235},
  publisher =    {PMLR}
}

FAQ

  1. GLIBCXX_3.4.30 not found.
conda install gcc=12.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/miniconda3/env/rime/lib