Data repository for our paper ""My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models"
This repo contains the annotated data we used for training our evaluator in labeled_model_output
, and the model output with mapping result in outputs
.
We also released the classifiers we trained on huggingface. Please try them out.
If you find this repository useful or our work is related to your research, please kindly cite it:
@inproceedings{wang-etal-2024-answer-c,
title = "{``}My Answer is {C}{''}: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models",
author = {Wang, Xinpeng and
Ma, Bolei and
Hu, Chengzhi and
Weber-Genzel, Leon and
R{\"o}ttger, Paul and
Kreuter, Frauke and
Hovy, Dirk and
Plank, Barbara},
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.441",
doi = "10.18653/v1/2024.findings-acl.441",
pages = "7407--7416",
}