diff --git a/README.md b/README.md index 8192af7..2906e0e 100644 --- a/README.md +++ b/README.md @@ -37,8 +37,28 @@ export GEMINI_API_KEY= You can find all runnable experiments in the `scripts` directory. Their filename should explicitly tell you their purpose. -For example, `scripts/run_rm_evals.sh` runs the RewardBench inference pipeline on a select number of models given a dataset: + +### Getting rewards from a Reward Model (RM) on a HuggingFace dataset + +Here, we use the `rewardbench` command-line interface and pass a HuggingFace dataset. +For example, if we want to get the reward score of the UltraRM-13b reward model on a preference dataset, we run: ```sh -./scripts/run_rm_evals.sh +rewardbench \ + --model openbmb/UltraRM-13b \ + --chat_template openbmb \ + --dataset $DATASET \ + --split $SPLIT \ + --output_dir $OUTDIR \ + --batch_size 8 \ + --trust_remote_code \ + --force_truncation \ + --save_all ``` + +The evaluation parameters can be found in the [allenai/reward-bench](https://github.com/allenai/reward-bench/blob/main/scripts/configs/eval_configs.yaml) repository. +This runs the reward model on the (prompt, chosen, rejected) triples and give us the reward score for each instance. +The results are saved into a JSON file inside the `$OUTDIR` directory. +Finally, you can find some experiments in the `scripts/run_rm_evals.sh` script. + +### \ No newline at end of file diff --git a/scripts/run_generative.py b/scripts/run_generative.py index 73ea5b8..561045c 100644 --- a/scripts/run_generative.py +++ b/scripts/run_generative.py @@ -21,6 +21,7 @@ # Examples: # python scripts/run_generative.py --dataset_name --model gpt-3.5-turbo # python scripts/run_generative.py --dataset_name --model=claude-3-haiku-20240307 +# python scripts/run_generative.py --dataset_name --model=CohereForAI/c4ai-command-r-v01 --num_gpus 2 --force_local # note: for none API models, this script uses vllm # pip install vllm