Skip to content

Cardinal-Operations/ORLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ORLM: Training Large Language Models for Optimization Modeling

OR-Instruct

This project explores training open-source LLMs for optimization modeling. We identify four critical requirements for the training dataset of OR LLMs, design and implement OR-Instruct, a semi-automated process for creating synthetic data tailored to specific requirements. We also introduce the IndustryOR benchmark, the first industrial benchmark for testing LLMs on solving real-world OR problems. We apply the data from OR-Instruct to various open-source LLMs of 7b size (termed as ORLMs), resulting in a significantly improved capability for optimization modeling. Read our paper here.

News

Model Checkpoint License NL4OPT MAMO EasyLP MAMO ComplexLP IndustryOR Micro Avg Macro Avg
ORLM-LLaMA-3-8B 🤗 HF Link llama3 85.7% 82.3% 37.4% 38.0% 71.4% 60.8%

Performances

Below is the comparison of performance on the NL4OPT, MAMO, and IndustryOR benchmarks. Values marked with a * are directly copied from original papers, with blanks where data were not reported. The highest results are highlighted in bold.

Method NL4OPT MAMO EasyLP MAMO ComplexLP IndustryOR Micro Avg Macro Avg
Methods based on PLMs
tag-BART 47.9%* - - - - -
Methods based on GPT-3.5
Standard 42.4%* - - - - -
Reflexion 50.7%* - - - - -
Chain-of-Experts 58.9%* - - - - -
Methods based on GPT-4
Standard 47.3%* 66.5%* 14.6%* 28.0% 50.2% 39.1%
Reflexion 53.0%* - - - - -
Chain-of-Experts 64.2%* - - - - -
OptiMUS 78.8%* - - - - -
ORLMs based on open-source LLMs
ORLM-Mistral-7B 84.4% 81.4% 32.0% 27.0% 68.8% 56.2%
ORLM-Deepseek-Math-7B-Base 86.5% 82.2% 37.9% 33.0% 71.2% 59.9%
ORLM-LLaMA-3-8B 85.7% 82.3% 37.4% 38.0% 71.4% 60.8%

Setup

To get started, clone ORLM and install the required packages:

git clone https://github.com/Cardinal-Operations/ORLM.git
cd ORLM
pip install -r requirements.txt

Inference

Prompting Template:

Below is an operations research question. Build a mathematical model and corresponding python code using `coptpy` that appropriately addresses the question.

# Question:
{Question}

# Response:

Please replace the {Question} with any natural language OR question.

To run a sample inference, use this command:

cd ORLM
python scripts/inference.py --model_name_or_path <path_to_local_orlm_directory> --tensor_parallel_size <num_gpus>

Evaluation

First, we prompt the ORLMs to generate a complete solution that includes both a mathematical model and a program (refer to eval/generate.py). We then extract this program and run it to obtain the predicted optimal value using parallel processing (see eval/execute.py, currently supporting only the COPT solver). We evaluate the accuracy by comparing the execution results with the ground truth optimal value.

Note that variations in results may occur due to differences in computing resources, as executions are performed in parallel. Additionally, for hard examples like IndustryOR, where the number of variables may increase significantly, consider applying for COPT web licenses. Otherwise, execution may directly fail.

Here's how to evaluate the ORLM models on various benchmarks:

# (Optional) If you have trouble accessing the Hugging Face website, you can set an alternative endpoint:
# export HF_ENDPOINT=https://hf-mirror.com
cd ORLM
sh scripts/eval.all.sh <path_to_local_orlm_directory> <number_of_gpus>

We also provide detailed completions and execution results in the results directory for the ORLM-LLaMA-3-8B model on the above benchmarks.

Citation

Please cite the paper if you refer to our model, code, data or paper.

@article{tang2024orlm,
  title={ORLM: Training Large Language Models for Optimization Modeling},
  author={Tang, Zhengyang and Huang, Chenyu and Zheng, Xin and Hu, Shixi and Wang, Zizhuo and Ge, Dongdong and Wang, Benyou},
  journal={arXiv preprint arXiv:2405.17743},
  year={2024}
}