Release Note

This release focuses on customization and personalisation: it's now possible to define custom metrics, not just custom tasks, see the README for the full mechanism.
Also includes small fixes to improve stability and new tasks. We made the choice to split community tasks from the main library source to better manage maintenance.

Better community task handling

New mechanism for evaluation contributions by @clefourrier in #47
Adding the custom metrics system by @clefourrier in #65

New tasks

Add GPQA by @clefourrier in #42
Adding support for Arabic benchmarks : AceGPT benchmarking suite by @alielfilali01 in #44
IFEval by @clefourrier in #48

Features

Add an automatic system to compute average for tasks with subtasks by @clefourrier in #41

small patches

Typos #27, #28, #30, #29, #34,
Better README #26, #37, #55,
Patch fix to match with config update/simplification in nanotron by @thomwolf in #35
bump transformers to 4.38 by @NathanHB in #46
Small fix to be able to use extensions of nanotron configs by @thomwolf in #58
Remove the eos token override in the Default Config Task by @clefourrier in #54
Update leaderboard task set by @lewtun in #60
Remove the eos token override in the Default Config Task by @clefourrier in #54
Fixes wikitext prompts + some patches on tg models by @clefourrier in #64
Fix unset generation size by @clefourrier in #76
Update ruff by @clefourrier in #71
Relax sentencepiece version by @lewtun in #74
Better chat template system by @clefourrier in #38

✨ Community Contributions

@ledrui made their first contribution in #26
@alielfilali01 made their first contribution in #44
@lewtun made their first contribution in #55

Full Changelog: v0.1.1...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0