Skip to content

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment (NeurIPS 2024)

Notifications You must be signed in to change notification settings

tengxiao1/Cal-DPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment (NeurIPS 2024)

Environment

First, install PyTorch 2.1.2 from the PyTorch Installation Page.

Create a Python virtual environment using e.g. Conda:

conda create -n Cal-DPO python=3.10 && conda activate Cal-DPO
python -m pip install flash-attn --no-build-isolation

Training Scripts

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py zephyr-selm/dpo_config_full.yaml

Evaluation Benchmarks

Reference

If you find our repo to be useful, please cite our paper:

@inproceedings{Cal-DPO2024,
  title={Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment },
  author={Xiao, Teng and Yuan, Yige and Zhu, Huaisheng and Li, Mingxiao and Honavar, Vasant G},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

About

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment (NeurIPS 2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published