Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AraDICE task config file #2507

Merged
merged 21 commits into from
Dec 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions lm_eval/tasks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
| [arabic_leaderboard_complete](arabic_leaderboard_complete/README.md) | A full version of the tasks in the Open Arabic LLM Leaderboard, focusing on the evaluation of models that reflect the characteristics of Arabic language understanding and comprehension, culture, and heritage. Note that some of these tasks are machine-translated. | Arabic (Some MT) |
| [arabic_leaderboard_light](arabic_leaderboard_light/README.md) | A light version of the tasks in the Open Arabic LLM Leaderboard (i.e., 10% samples of the test set in the original benchmarks), focusing on the evaluation of models that reflect the characteristics of Arabic language understanding and comprehension, culture, and heritage. Note that some of these tasks are machine-translated. | Arabic (Some MT) |
| [arabicmmlu](arabicmmlu/README.md) | Localized Arabic version of MMLU with multiple-choice questions from 40 subjects. | Arabic |
| [AraDICE](aradice/README.md) | A collection of multiple tasks carefully designed to evaluate dialectal and cultural capabilities in large language models (LLMs). | Arabic |
| [arc](arc/README.md) | Tasks involving complex reasoning over a diverse set of questions. | English |
| [arithmetic](arithmetic/README.md) | Tasks involving numerical computations and arithmetic reasoning. | English |
| [asdiv](asdiv/README.md) | Tasks involving arithmetic and mathematical reasoning challenges. | English |
Expand Down
12 changes: 12 additions & 0 deletions lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
group: AraDiCE_ArabicMMLU_egy
task:
- AraDiCE_ArabicMMLU_humanities_egy
- AraDiCE_ArabicMMLU_language_egy
- AraDiCE_ArabicMMLU_social-science_egy
- AraDiCE_ArabicMMLU_stem_egy
- AraDiCE_ArabicMMLU_other_egy
aggregate_metric_list:
- metric: acc
weight_by_size: True
- metric: acc_norm
weight_by_size: True
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_humanities_history"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_high_humanities_history_egy"
"task_alias": "high humanities history"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_humanities_islamic-studies"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_high_humanities_islamic-studies_egy"
"task_alias": "high humanities islamic-studies"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_humanities_philosophy"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_high_humanities_philosophy_egy"
"task_alias": "high humanities philosophy"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_language_arabic-language"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_high_language_arabic-language_egy"
"task_alias": "high language arabic-language"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_social-science_civics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_high_social-science_civics_egy"
"task_alias": "high social-science civics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_social-science_economics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_high_social-science_economics_egy"
"task_alias": "high social-science economics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_social-science_geography"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_high_social-science_geography_egy"
"task_alias": "high social-science geography"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_stem_biology"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_high_stem_biology_egy"
"task_alias": "high stem biology"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_stem_computer-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_high_stem_computer-science_egy"
"task_alias": "high stem computer-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_stem_physics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_high_stem_physics_egy"
"task_alias": "high stem physics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_humanities_history"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_middle_humanities_history_egy"
"task_alias": "middle humanities history"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_humanities_islamic-studies"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_middle_humanities_islamic-studies_egy"
"task_alias": "middle humanities islamic-studies"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_language_arabic-language"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_middle_language_arabic-language_egy"
"task_alias": "middle language arabic-language"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_other_general-knowledge"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_other_egy"
"task": "AraDiCE_ArabicMMLU_middle_other_general-knowledge_egy"
"task_alias": "middle other general-knowledge"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_social-science_civics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_middle_social-science_civics_egy"
"task_alias": "middle social-science civics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_social-science_economics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_middle_social-science_economics_egy"
"task_alias": "middle social-science economics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_social-science_geography"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_middle_social-science_geography_egy"
"task_alias": "middle social-science geography"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_social-science_social-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_middle_social-science_social-science_egy"
"task_alias": "middle social-science social-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_stem_computer-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_middle_stem_computer-science_egy"
"task_alias": "middle stem computer-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_stem_natural-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_middle_stem_natural-science_egy"
"task_alias": "middle stem natural-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_humanities_islamic-studies"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_na_humanities_islamic-studies_egy"
"task_alias": "na humanities islamic-studies"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_language_arabic-language-general"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_na_language_arabic-language-general_egy"
"task_alias": "na language arabic-language-general"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_language_arabic-language-grammar"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_na_language_arabic-language-grammar_egy"
"task_alias": "na language arabic-language-grammar"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_other_driving-test"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_other_egy"
"task": "AraDiCE_ArabicMMLU_na_other_driving-test_egy"
"task_alias": "na other driving-test"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_other_general-knowledge"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_other_egy"
"task": "AraDiCE_ArabicMMLU_na_other_general-knowledge_egy"
"task_alias": "na other general-knowledge"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_humanities_history"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_primary_humanities_history_egy"
"task_alias": "primary humanities history"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_humanities_islamic-studies"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_primary_humanities_islamic-studies_egy"
"task_alias": "primary humanities islamic-studies"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_language_arabic-language"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_primary_language_arabic-language_egy"
"task_alias": "primary language arabic-language"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_other_general-knowledge"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_other_egy"
"task": "AraDiCE_ArabicMMLU_primary_other_general-knowledge_egy"
"task_alias": "primary other general-knowledge"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_social-science_geography"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_primary_social-science_geography_egy"
"task_alias": "primary social-science geography"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_social-science_social-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_primary_social-science_social-science_egy"
"task_alias": "primary social-science social-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_stem_computer-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_primary_stem_computer-science_egy"
"task_alias": "primary stem computer-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_stem_math"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_primary_stem_math_egy"
"task_alias": "primary stem math"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_stem_natural-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_primary_stem_natural-science_egy"
"task_alias": "primary stem natural-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Loading
Loading