-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BFCL] Adding Llama-3.1-Storm-8B model handler #598
base: main
Are you sure you want to change the base?
[BFCL] Adding Llama-3.1-Storm-8B model handler #598
Conversation
@HuanzhiMao (Just a gentle reminder) Could you help reviewing this. |
Here are the available functions: | ||
<tools>{tools}</tools> | ||
|
||
Follow the below guidelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These guidelines are mentioned in the 'Prompt Format for Function Calling' section in the model documentation. They need to be aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Llama-3.1-Storm-8B
is not trained on negative data. Since model has high instruction following (IFEval) capability, we added two instructions to compensate for the lack of negative data in training. It is also consistent with our evaluation prompt in the blog post: https://huggingface.co/blog/akjindal53244/llama31-storm8b#bfcl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification. To ensure consistency and fairness across all models, it's important that the model handler uses the same prompt format as outlined in its official documentation.
In this case, while you have a specialized prompt optimized for BFCL categories, it would be more equitable to use the general-purpose prompt specified in the documentation for a uniform comparison with other models.
@@ -865,7 +865,11 @@ def get_movie_rating(movie_name): | |||
url = "http://www.omdbapi.com/" | |||
params = {"t": movie_name, "apikey": api_key["OMDB-API-KEY"]} | |||
response = requests.get(url, params=params) | |||
return response.json()["Rated"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to add try except here since there's a try except outside in executable category here https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/eval_checker/checker.py#L702-L716 that handles when "Rated" is not a key.
And the API is expected to return a dict with Rated field like the following, if there isn't such field, it would be caught in API sanity check, indicating that there's an API response change from the provider.
>>> url = "http://www.omdbapi.com/"
>>> params = {"t": "Avatar", "apikey": api_key["OMDB-API-KEY"]}
>>> response = requests.get(url, params=params)
>>> response.json()
{'Title': 'Avatar', 'Year': '2009', 'Rated': 'PG-13', 'Released': '18 Dec 2009', 'Runtime': '162 min', 'Genre': 'Action, Adventure, Fantasy', 'Director': 'James Cameron', 'Writer': 'James Cameron', 'Actors': 'Sam Worthington, Zoe Saldana, Sigourney Weaver', 'Plot': 'A paraplegic Marine dispatched to the moon Pandora on a unique mission becomes torn between following his orders and protecting the world he feels is his home.', 'Language': 'English, Spanish', 'Country': 'United States, United Kingdom', 'Awards': 'Won 3 Oscars. 91 wins & 131 nominations total', 'Poster': 'https://m.media-amazon.com/images/M/MV5BZDA0OGQxNTItMDZkMC00N2UyLTg3MzMtYTJmNjg3Nzk5MzRiXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_SX300.jpg', 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.9/10'}, {'Source': 'Rotten Tomatoes', 'Value': '81%'}, {'Source': 'Metacritic', 'Value': '83/100'}], 'Metascore': '83', 'imdbRating': '7.9', 'imdbVotes': '1,397,142', 'imdbID': 'tt0499549', 'Type': 'movie', 'DVD': 'N/A', 'BoxOffice': '$785,221,649', 'Production': 'N/A', 'Website': 'N/A', 'Response': 'True'}
>>> response.json()["Rated"]
'PG-13'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting KeyError when running the evaluation. Here are the logs:
File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner.py", line 495, in <module>
runner(model_names, test_categories, api_sanity_check)
File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner.py", line 384, in runner
get_executable_expected_output(prompt_file)
File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner_helper.py", line 908, in get_executable_expected_output
exec(
File "<string>", line 2, in <module>
File "/gorilla/berkeley-function-call-leaderboard/eval_checker/executable_python_function.py", line 880, in get_movie_director
return response.json()["Director"]
KeyError: 'Director'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @akshita-sukhlecha, thanks for letting us know this error and providing the log. We want to raise a separate issue to do more investigation to this, and this should be caught in the get_executable_expected_output function, but not at the API's level. The API should freely raise exceptions if the request.get() json output does not have the expected keys.
Can you revert these changes to the API in this PR, and just provide the model handler? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CharlieJCJ . I have reverted the changes of handling exception at API level.
The task of handling exceptions properly will be a blocker for evaluating this model. So, should merge these changes after that task is completed ?
One more thing to note is - There are already 10 instances in the executable_python_function.py
file where exceptions are being handled at API level. Eg: https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/eval_checker/executable_python_function.py#L577
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that correctly evaluating the model during your local evaluation have dependency on this new PR. This handling exception in the get_executable_expected_output
logic will affect all models we want to review it thoroughly, related to #563. We can merge your model handler PR first, and then merge the PR that addresses handling error PR (with API definition fixes) since this will affects all models' score, they are fairly separate sources of concerns.
For context, this error didn't show up previously in our internal testing for a long time (over a couple months) and we have being actively evaluating recently as well using the omdb key, and didn't have this API exception.
There are already 10 instances in the executable_python_function.py file where exceptions are being handled at API level.
For this, thanks for raising, we also noticed this and yes we should be consistent with other functions, and we will review all executable python functions to make sure there are no false negatives during evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting KeyError when running the evaluation. Here are the logs:
File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner.py", line 495, in <module> runner(model_names, test_categories, api_sanity_check) File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner.py", line 384, in runner get_executable_expected_output(prompt_file) File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner_helper.py", line 908, in get_executable_expected_output exec( File "<string>", line 2, in <module> File "/gorilla/berkeley-function-call-leaderboard/eval_checker/executable_python_function.py", line 880, in get_movie_director return response.json()["Director"] KeyError: 'Director'
I faced same issue. resolved by updating the omdbapi
key
@@ -877,7 +881,11 @@ def get_movie_director(movie_name): | |||
url = "http://www.omdbapi.com/" | |||
params = {"t": movie_name, "apikey": api_key["OMDB-API-KEY"]} | |||
response = requests.get(url, params=params) | |||
return response.json()["Director"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
Hey @akshita-sukhlecha and @Kishore-bluekyte is this still relevant? If so, once you resolve the comments, I'd be happy to go ahead and merge it. |
Llama-3.1-Storm-8B model was recently released. This PR adds model handler for Llama-3.1-Storm-8B.