v0.3.0
Release Note
This introduced the new extended tasks feature, documentation and many other patches for improved stability.
New tasks are also introduced:
- Big Bench Hard: https://huggingface.co/papers/2210.09261
- AGIEval: https://huggingface.co/papers/2304.06364
- TinyBench:
- MT Bench: https://huggingface.co/papers/2306.05685
- AlGhafa Benchmarking Suite: https://aclanthology.org/2023.arabicnlp-1.21/
MT-Bench marks the introduction of multi-turn prompting as well as llm-as-a-judge metric.
New tasks
- Add BBH by @clefourrier in #7, @bilgehanertan in #126
- Add AGIEval by @clefourrier in #121
- Adding TinyBench by @clefourrier in #104
- Adding support for Arabic benchmarks : AlGhafa benchmarking suite by @alielfilali01 in #95
- Add mt-bench by @NathanHB in #75
Features
- Extended Tasks ! by @clefourrier in #101, @lewtun in #108, @NathanHB in #122, #123
- Added support for launching inference endpoint with different model dtypes by @shaltielshmid in #124
Documentation
- Adding LICENSE by @clefourrier in #86, @NathanHB in #89
- Make it clearer in the README that the leaderboard uses the harness by @clefourrier in #94
Small patches
- Update huggingface-hub for compatibility with datasets 2.18 by @clefourrier in #84
- Tidy up dependency groups by @lewtun in #81
- bump git python by @NathanHB in #90
- Sets a max length for the MATH task by @clefourrier in #83
- Fix parallel data processing bug by @clefourrier in #92
- Change the eos condition for GSM8K by @clefourrier in #85
- Fixing rolling loglikelihood management by @clefourrier in #78
- Fixes input length management for generative evals by @clefourrier in #103
- Reorder addition of instruction in chat template by @clefourrier in #111
- Ensure chat models terminate generation with EOS token by @lewtun in #115
- Fix push details to hub by @NathanHB in #98
- Small fixes to InferenceEndpointModel by @shaltielshmid in #112
- Fix import typo autogptq by @clefourrier in #116
- Fixed the loglikelihood method in inference endpoints models by @clefourrier in #119
- Fix TextGenerationResponse import from hfh by @Wauplin in #129
- Do not use deprecated list_files_info by @Wauplin in #133
- Update test workflow name to 'Tests' by @Wauplin in #134
New Contributors
- @shaltielshmid made their first contribution in #112
- @bilgehanertan made their first contribution in #126
- @Wauplin made their first contribution in #129
Full Changelog: v0.2.0...v0.3.0