MCQ

This repository contains code accompanying the paper A Systematic Evaluation of Decoding-Free Generative Candidate Selection Methods.

MCQ

For MCQ datasets, to execute the full decoding method, run mcq_decoding.py and specify the desired LM and dataset arguments. For example,

python mcq_decoding.py --model_name meta-llama/Meta-Llama-3-8B --dataset commonsense_qa

To execute estimation methods, run mcq_estimation.py and specify the LM, dataset. For example,

python mcq_estimation.py --model_name meta-llama/Meta-Llama-3-8B --dataset commonsense_qa

The scripts download and preprocess data, perform inference, and compute the corresponding metrics, which are stored in results/date/. In particular, mcq_estimation.py computes the logit once for all estimation methods.

The variable names for these arguments are \verb|model_name|, \verb|dataset|, with their corresponding range of values as follows.

model_name: {meta-llama/Meta-Llama-3-8B, 
             meta-llama/Meta-Llama-3-8B-Instruct,
             mistralai/Mistral-7B-v0.3, 
             mistralai/Mistral-7B-Instruct-v0.3,   
             google/flan-t5-xl}
dataset: {commonsense_qa, mmlu, gpqa, big_bench, arc}

Clibench

Download the test data of the four clinical decision tasks from CliBench

To execute estimation methods, run clibench_estimation.py and specify the LM, dataset. For example,

python clibench_estimation.py --model_name meta-llama/Meta-Llama-3-8B --target-task target_diagnoses

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
clibench_estimation.py		clibench_estimation.py
data_loader.py		data_loader.py
estimators.py		estimators.py
mcq_decoding.py		mcq_decoding.py
mcq_estimation.py		mcq_estimation.py
metric.py		metric.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCQ

Clibench

About

Releases

Packages

Languages

mdkma/gen-candidate-selection

Folders and files

Latest commit

History

Repository files navigation

MCQ

Clibench

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages