This is the repository containing evaluation datas, instructions and demonstrations with ACL 2024 paper TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models (Chu et al., 2023)
- TimeX-NLI (Thukral et al., 2021)
- Date Arithmetic (Tan et al., 2023)
- MCTACO (Zhou et al., 2019)
- TimeDial (Qin et al., 2021)
- DurationQA (Giovanni et al., 2022)
- SituatedGen (Zhang et al, 2023)
- TimeQA (Chen et al., 2021)
- TempReason (Tan et al., 2023)
- MenatQA (Wei et al., 2023)
- TRACIE (Zhou et al., 2021)
- GPT-4 (OpenAI, 2023)
- GPT-3.5 (OpenAI, 2022)
- LLaMA2 (Touvron et al., 2023)
- Baichuan2 (Yang et al., 2023)
- Vicuna-1.5 (Chiang et al., 2023)
- Mistral (Jiang et al., 2023)
- ChatGLM3 (Zeng et al., 2023)
- FLAN-T5 (Chung et al., 2022)
If you find our work helpful, you can cite this paper as:
@misc{chu2023timebench,
title={TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models},
author={Zheng Chu and Jingchang Chen and Qianglong Chen and Weijiang Yu and Haotian Wang and Ming Liu and Bing Qin},
year={2023},
eprint={2311.17667},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2311.17667}
}