[BFCL] Add Test Dataset to Repository (#504)

Previously, the test dataset was stored on HuggingFace, requiring users to clone this repository, download the dataset separately from HuggingFace, and then run the evaluation pipeline. This has caused many inconveniences and confusion within the community. Users often struggled with the inconsistency of having the possible answers within the repository while the test dataset was missing. Partially addresses #501.
ShishirPatil · Jul 11, 2024 · 7bef000 · 7bef000
1 parent 791f6f8
commit 7bef000
Show file tree

Hide file tree

Showing 16 changed files with 2,001 additions and 28 deletions.
diff --git a/.gitignore b/.gitignore
@@ -28,24 +28,8 @@ berkeley-function-call-leaderboard/eval_checker/tree-sitter-javascript
 berkeley-function-call-leaderboard/tree-sitter-java
 berkeley-function-call-leaderboard/tree-sitter-javascript
 
-# Ignore Evaluation Dataset files that we download from https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard/tree/main
+# Ignore aggregated eval data (used for OSS models)
 berkeley-function-call-leaderboard/eval_data_total.json
-berkeley-function-call-leaderboard/data/.gitattributes
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_chatable.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_multiple_function.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_parallel_multiple_function.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_simple.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_relevance.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_parallel_function.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_multiple_function.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_parallel_multiple_function.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_rest.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_javascript.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_simple.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_sql.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_java.json
-berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_parallel_function.json
-berkeley-function-call-leaderboard/data/README.md
 
 # Ignore inference results
 berkeley-function-call-leaderboard/result/

diff --git a/berkeley-function-call-leaderboard/README.md b/berkeley-function-call-leaderboard/README.md
@@ -22,17 +22,6 @@ pip install vllm # If you have vLLM supported GPU(s) and want to run our evaluat
 ```
 
 
-## Prepare Evaluation Dataset
-
-Download the evaluation dataset from huggingface. From the current directory `gorilla/berkeley-function-call-leaderboard`, run the following command:
-
-```bash
-huggingface-cli download gorilla-llm/Berkeley-Function-Calling-Leaderboard --local-dir data --repo-type dataset
-```
-
-The evaluation datasets are now stored in the `data` subdirectory. The possible answers are stored in the `data/possible_answer` subdirectory.
-
-
 ## Execution Evaluation Data Post-processing (Can be Skipped: Necesary for Executable Test Categories)
 Add your keys into `function_credential_config.json`, so that the original placeholder values in questions, params, and answers will be reset.
 

diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_chatable.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_chatable.json
diff --git a/...ion-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_multiple_function.json b/...ion-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_multiple_function.json
diff --git a/...ion-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_parallel_function.json b/...ion-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_parallel_function.json
diff --git a/...leaderboard/data/gorilla_openfunctions_v1_test_executable_parallel_multiple_function.json b/...leaderboard/data/gorilla_openfunctions_v1_test_executable_parallel_multiple_function.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_simple.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_simple.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_java.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_java.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_javascript.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_javascript.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_multiple_function.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_multiple_function.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_parallel_function.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_parallel_function.json
diff --git a/...ction-call-leaderboard/data/gorilla_openfunctions_v1_test_parallel_multiple_function.json b/...ction-call-leaderboard/data/gorilla_openfunctions_v1_test_parallel_multiple_function.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_relevance.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_relevance.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_rest.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_rest.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_simple.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_simple.json
diff --git a/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_sql.json b/berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_sql.json