TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

This is the repository containing evaluation datas, instructions and demonstrations with ACL 2024 paper TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models (Chu et al., 2023)

Datasets

Models

GPT-4 (OpenAI, 2023)
GPT-3.5 (OpenAI, 2022)
LLaMA2 (Touvron et al., 2023)
Baichuan2 (Yang et al., 2023)
Vicuna-1.5 (Chiang et al., 2023)
Mistral (Jiang et al., 2023)
ChatGLM3 (Zeng et al., 2023)
FLAN-T5 (Chung et al., 2022)

Performance

Citation

If you find our work helpful, you can cite this paper as:

@misc{chu2023timebench,
      title={TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models}, 
      author={Zheng Chu and Jingchang Chen and Qianglong Chen and Weijiang Yu and Haotian Wang and Ming Liu and Bing Qin},
      year={2023},
      eprint={2311.17667},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2311.17667}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
TimeBench-subset-7553		TimeBench-subset-7553
assets		assets
demonstrations		demonstrations
LICENSE		LICENSE
README.md		README.md
TimeBench-full-19000.zip		TimeBench-full-19000.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

Datasets

Symbolic Temporal Reasoning

Commonsense Temporal Reasoning

Event Temporal Reasoning

Models

Performance

Citation

About

Contributors 2

Languages

License

zchuz/TimeBench

Folders and files

Latest commit

History

Repository files navigation

TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

Datasets

Symbolic Temporal Reasoning

Commonsense Temporal Reasoning

Event Temporal Reasoning

Models

Performance

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages