-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make BFCL User-Friendly and Easy to Extend #510
base: main
Are you sure you want to change the base?
Conversation
- Move `model_handler` to `bfcl/model_handler` - Separate `oss` and `proprietary` model - Move java and javascript parsers to `bfcl/parser` - Standardize model handlers and remove duplicate methods
- Test data compilation handled by `bfcl/types.py:LeaderboardCategories.load_data` method
Hey @devanshamin Thank you so much for flagging and suggesting improvements! We agree with everything you mentioned. If it helps, one design decision we have adopted is that when in doubt, defer to simplicity and ease of code readability, given this is an OSS. re: landing the PR: We are absolutely delighted you want to contribute to the Berkeley Function Calling Leaderboard (BFCL) project, and will be absolutely on-board to review and land this PR! Welcome aboard mate! |
Awesome! I'm glad to hear you're on board. Keeping the theme of simplicity in mind, I was thinking of coming up with a detailed plan for the refactor and getting your feedback on it. @HuanzhiMao reached out to me, and we are planning on setting up a Zoom call. During the call, I can go over the changes that I have made and the plan, and hear your thoughts and feedback on how to move forward. After the meeting, I can write up a draft with next steps which we can track over here. |
- poetry build system is no longer used
- To allow for separate dependencies for oss and proprietary model
- test category is already added to each example during loading the data
1cd7222
to
f736521
Compare
4d8a86c
to
58a0648
Compare
- Use same test groups for benchmarking and evaluation - Add a custom enum class with intuitive methods to dynamically create test groups - Use custom enum to reduce manually creation of test groups - Update benchmark cli args to accept test group argument - Add pydantic validator to validate test group and test categories
58a0648
to
88e8462
Compare
Here is an article outlining steps on merging this PR - #521 |
- Load original json test data files - Add `id` and `test_category` keys to each example - Save model responses for each test category in a separate file
- Single cli entrypoint with subcommands to run benchmark and evaluation
d7847fd
to
3f73201
Compare
This PR aims to improve the organization and distribution of the codebase by packaging the BFCL codebase. This PR is part of a series of changes that break down the tasks outlined in #510. --------- Co-authored-by: Huanzhi Mao <huanzhimao@gmail.com>
This PR reorganizes the model handler by splitting it into two distinct components: an Open Source (OSS) model handler and a Proprietary model handler. This change is part of a series of updates that address the tasks outlined in issue #510. --------- Co-authored-by: Huanzhi Mao <huanzhimao@gmail.com>
…_credential_config.py (#675) This PR addresses the issue of hard-coded relative file paths in BFCL, which previously made it impossible to run the script from different entry locations/directories. With this update, the script can now be executed from any directory, unblocking #621. Additionally, this PR automates the `apply_function_credential_config.py` step, removing the need for users to manually trigger the script to apply the credential files. Part of the effort to merge #510. --------- Co-authored-by: Devansh Amin <devanshamin97@gmail.com>
…_credential_config.py (ShishirPatil#675) This PR addresses the issue of hard-coded relative file paths in BFCL, which previously made it impossible to run the script from different entry locations/directories. With this update, the script can now be executed from any directory, unblocking ShishirPatil#621. Additionally, this PR automates the `apply_function_credential_config.py` step, removing the need for users to manually trigger the script to apply the credential files. Part of the effort to merge ShishirPatil#510. --------- Co-authored-by: Devansh Amin <devanshamin97@gmail.com>
Thank you for open-sourcing BFCL and for your efforts in maintaining it. As I explored the codebase, I noticed some areas for improvement, including duplicate functions, constants, and variables that are inaccessible outside their functions, as well as a lack of abstract classes/methods.
To address these issues, I've begun refactoring the codebase to make it more straightforward to follow, customize, install, and extend. Although this work is still in progress, I wanted to share it with the contributors to seek feedback and gauge interest in reviewing or contributing to the PR. I understand that it’s a significant refactor and may require considerable time and effort, so any assistance or feedback would be greatly appreciated.
To-Do:
Refactor
bfcl
package structurebenchmark
andmodel_handler
evaluate
andeval_checker
Test
benchmark
-> OSS and Proprietary model handlersevaluate
Goal: To make BFCL similar to lm-evaluation-harness but for function calling.
Please let me know your thoughts and if you would be interested in reviewing or contributing to this PR.