Skip to content

Commit

Permalink
[BFCL] Add Test Dataset to Repository (#504)
Browse files Browse the repository at this point in the history
Previously, the test dataset was stored on HuggingFace, requiring users
to clone this repository, download the dataset separately from
HuggingFace, and then run the evaluation pipeline. This has caused many
inconveniences and confusion within the community. Users often struggled
with the inconsistency of having the possible answers within the
repository while the test dataset was missing.

Partially addresses #501.
  • Loading branch information
HuanzhiMao authored Jul 11, 2024
1 parent 791f6f8 commit 7bef000
Show file tree
Hide file tree
Showing 16 changed files with 2,001 additions and 28 deletions.
18 changes: 1 addition & 17 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,24 +28,8 @@ berkeley-function-call-leaderboard/eval_checker/tree-sitter-javascript
berkeley-function-call-leaderboard/tree-sitter-java
berkeley-function-call-leaderboard/tree-sitter-javascript

# Ignore Evaluation Dataset files that we download from https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard/tree/main
# Ignore aggregated eval data (used for OSS models)
berkeley-function-call-leaderboard/eval_data_total.json
berkeley-function-call-leaderboard/data/.gitattributes
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_chatable.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_multiple_function.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_parallel_multiple_function.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_simple.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_relevance.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_parallel_function.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_multiple_function.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_parallel_multiple_function.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_rest.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_javascript.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_simple.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_sql.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_java.json
berkeley-function-call-leaderboard/data/gorilla_openfunctions_v1_test_executable_parallel_function.json
berkeley-function-call-leaderboard/data/README.md

# Ignore inference results
berkeley-function-call-leaderboard/result/
Expand Down
11 changes: 0 additions & 11 deletions berkeley-function-call-leaderboard/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,6 @@ pip install vllm # If you have vLLM supported GPU(s) and want to run our evaluat
```


## Prepare Evaluation Dataset

Download the evaluation dataset from huggingface. From the current directory `gorilla/berkeley-function-call-leaderboard`, run the following command:

```bash
huggingface-cli download gorilla-llm/Berkeley-Function-Calling-Leaderboard --local-dir data --repo-type dataset
```

The evaluation datasets are now stored in the `data` subdirectory. The possible answers are stored in the `data/possible_answer` subdirectory.


## Execution Evaluation Data Post-processing (Can be Skipped: Necesary for Executable Test Categories)
Add your keys into `function_credential_config.json`, so that the original placeholder values in questions, params, and answers will be reset.

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

0 comments on commit 7bef000

Please sign in to comment.