Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment (NeurIPS 2024)

Environment

First, install PyTorch 2.1.2 from the PyTorch Installation Page.

Create a Python virtual environment using e.g. Conda:

conda create -n Cal-DPO python=3.10 && conda activate Cal-DPO

python -m pip install flash-attn --no-build-isolation

Training Scripts

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py zephyr-selm/dpo_config_full.yaml

Evaluation Benchmarks

Open LLM Leadboard
New Open LLM Leadboard

Reference

If you find our repo to be useful, please cite our paper:

@inproceedings{Cal-DPO2024,
  title={Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment },
  author={Xiao, Teng and Yuan, Yige and Zhu, Huaisheng and Li, Mingxiao and Honavar, Vasant G},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment (NeurIPS 2024)

Environment

Training Scripts

Evaluation Benchmarks

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment (NeurIPS 2024)

Environment

Training Scripts

Evaluation Benchmarks

Reference