[BFCL] Adding Llama-3.1-Storm-8B model handler #598

akshita-sukhlecha · 2024-08-23T07:47:16Z

Llama-3.1-Storm-8B model was recently released. This PR adds model handler for Llama-3.1-Storm-8B.

…lama 3.1 based models

akshita-sukhlecha · 2024-08-24T18:31:24Z

@HuanzhiMao (Just a gentle reminder) Could you help reviewing this.

HuanzhiMao · 2024-08-25T03:43:54Z

berkeley-function-call-leaderboard/model_handler/storm_handler.py

+            Here are the available functions:
+            <tools>{tools}</tools>
+
+            Follow the below guidelines:


These guidelines are mentioned in the 'Prompt Format for Function Calling' section in the model documentation. They need to be aligned.

Llama-3.1-Storm-8B is not trained on negative data. Since model has high instruction following (IFEval) capability, we added two instructions to compensate for the lack of negative data in training. It is also consistent with our evaluation prompt in the blog post: https://huggingface.co/blog/akjindal53244/llama31-storm8b#bfcl

Hi @akshita-sukhlecha,

Thank you for the clarification. To ensure consistency and fairness across all models, it's important that the model handler uses the same prompt format as outlined in its official documentation.

In this case, while you have a specialized prompt optimized for BFCL categories, it would be more equitable to use the general-purpose prompt specified in the documentation for a uniform comparison with other models.

cc @ShishirPatil

CharlieJCJ · 2024-08-25T06:09:05Z

berkeley-function-call-leaderboard/eval_checker/executable_python_function.py

@@ -865,7 +865,11 @@ def get_movie_rating(movie_name):
    url = "http://www.omdbapi.com/"
    params = {"t": movie_name, "apikey": api_key["OMDB-API-KEY"]}
    response = requests.get(url, params=params)
-    return response.json()["Rated"]


We don't need to add try except here since there's a try except outside in executable category here https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/eval_checker/checker.py#L702-L716 that handles when "Rated" is not a key.

And the API is expected to return a dict with Rated field like the following, if there isn't such field, it would be caught in API sanity check, indicating that there's an API response change from the provider.

>>> url = "http://www.omdbapi.com/" >>> params = {"t": "Avatar", "apikey": api_key["OMDB-API-KEY"]} >>> response = requests.get(url, params=params) >>> response.json() {'Title': 'Avatar', 'Year': '2009', 'Rated': 'PG-13', 'Released': '18 Dec 2009', 'Runtime': '162 min', 'Genre': 'Action, Adventure, Fantasy', 'Director': 'James Cameron', 'Writer': 'James Cameron', 'Actors': 'Sam Worthington, Zoe Saldana, Sigourney Weaver', 'Plot': 'A paraplegic Marine dispatched to the moon Pandora on a unique mission becomes torn between following his orders and protecting the world he feels is his home.', 'Language': 'English, Spanish', 'Country': 'United States, United Kingdom', 'Awards': 'Won 3 Oscars. 91 wins & 131 nominations total', 'Poster': 'https://m.media-amazon.com/images/M/MV5BZDA0OGQxNTItMDZkMC00N2UyLTg3MzMtYTJmNjg3Nzk5MzRiXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_SX300.jpg', 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.9/10'}, {'Source': 'Rotten Tomatoes', 'Value': '81%'}, {'Source': 'Metacritic', 'Value': '83/100'}], 'Metascore': '83', 'imdbRating': '7.9', 'imdbVotes': '1,397,142', 'imdbID': 'tt0499549', 'Type': 'movie', 'DVD': 'N/A', 'BoxOffice': '$785,221,649', 'Production': 'N/A', 'Website': 'N/A', 'Response': 'True'} >>> response.json()["Rated"] 'PG-13'

I was getting KeyError when running the evaluation. Here are the logs:

File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner.py", line 495, in <module> runner(model_names, test_categories, api_sanity_check) File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner.py", line 384, in runner get_executable_expected_output(prompt_file) File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner_helper.py", line 908, in get_executable_expected_output exec( File "<string>", line 2, in <module> File "/gorilla/berkeley-function-call-leaderboard/eval_checker/executable_python_function.py", line 880, in get_movie_director return response.json()["Director"] KeyError: 'Director'

Hi @akshita-sukhlecha, thanks for letting us know this error and providing the log. We want to raise a separate issue to do more investigation to this, and this should be caught in the get_executable_expected_output function, but not at the API's level. The API should freely raise exceptions if the request.get() json output does not have the expected keys.

Can you revert these changes to the API in this PR, and just provide the model handler? Thanks!

Thanks @CharlieJCJ . I have reverted the changes of handling exception at API level.

The task of handling exceptions properly will be a blocker for evaluating this model. So, should merge these changes after that task is completed ?

One more thing to note is - There are already 10 instances in the executable_python_function.py file where exceptions are being handled at API level. Eg: https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/eval_checker/executable_python_function.py#L577

I understand that correctly evaluating the model during your local evaluation have dependency on this new PR. This handling exception in the get_executable_expected_output logic will affect all models we want to review it thoroughly, related to #563. We can merge your model handler PR first, and then merge the PR that addresses handling error PR (with API definition fixes) since this will affects all models' score, they are fairly separate sources of concerns.

For context, this error didn't show up previously in our internal testing for a long time (over a couple months) and we have being actively evaluating recently as well using the omdb key, and didn't have this API exception.

There are already 10 instances in the executable_python_function.py file where exceptions are being handled at API level.

For this, thanks for raising, we also noticed this and yes we should be consistent with other functions, and we will review all executable python functions to make sure there are no false negatives during evaluation.

I was getting KeyError when running the evaluation. Here are the logs:

File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner.py", line 495, in <module> runner(model_names, test_categories, api_sanity_check) File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner.py", line 384, in runner get_executable_expected_output(prompt_file) File "/gorilla/berkeley-function-call-leaderboard/eval_checker/eval_runner_helper.py", line 908, in get_executable_expected_output exec( File "<string>", line 2, in <module> File "/gorilla/berkeley-function-call-leaderboard/eval_checker/executable_python_function.py", line 880, in get_movie_director return response.json()["Director"] KeyError: 'Director'

I faced same issue. resolved by updating the omdbapi key

CharlieJCJ · 2024-08-25T06:09:14Z

berkeley-function-call-leaderboard/eval_checker/executable_python_function.py

@@ -877,7 +881,11 @@ def get_movie_director(movie_name):
    url = "http://www.omdbapi.com/"
    params = {"t": movie_name, "apikey": api_key["OMDB-API-KEY"]}
    response = requests.get(url, params=params)
-    return response.json()["Director"]


Same comment as above.

ShishirPatil · 2024-10-20T07:49:10Z

Hey @akshita-sukhlecha and @Kishore-bluekyte is this still relevant? If so, once you resolve the comments, I'd be happy to go ahead and merge it.

AkshitaSukh added 2 commits August 22, 2024 23:59

Add handler for Llama-3.1-Storm-8B model

ed95814

1. Minor changes to fix KeyError, 2. vllm version change to support L…

be37567

…lama 3.1 based models

akshita-sukhlecha changed the title ~~Adding Llama-3.1-Storm-8B model handler~~ [BFCL] Adding Llama-3.1-Storm-8B model handler Aug 23, 2024

HuanzhiMao reviewed Aug 25, 2024

View reviewed changes

CharlieJCJ requested changes Aug 25, 2024

View reviewed changes

akshita-sukhlecha requested review from HuanzhiMao and CharlieJCJ August 26, 2024 17:50

Revert: Handle KeyError from API response

2635845

HuanzhiMao added the BFCL-New Model Add New Model to BFCL label Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] Adding Llama-3.1-Storm-8B model handler #598

[BFCL] Adding Llama-3.1-Storm-8B model handler #598

akshita-sukhlecha commented Aug 23, 2024

akshita-sukhlecha commented Aug 24, 2024

HuanzhiMao Aug 25, 2024

akshita-sukhlecha Aug 25, 2024

HuanzhiMao Aug 27, 2024

CharlieJCJ Aug 25, 2024 •

edited

Loading

akshita-sukhlecha Aug 25, 2024

CharlieJCJ Aug 26, 2024 •

edited

Loading

akshita-sukhlecha Aug 26, 2024

CharlieJCJ Aug 27, 2024 •

edited

Loading

Kishore-bluekyte Aug 27, 2024

CharlieJCJ Aug 25, 2024

ShishirPatil commented Oct 20, 2024

[BFCL] Adding Llama-3.1-Storm-8B model handler #598

Are you sure you want to change the base?

[BFCL] Adding Llama-3.1-Storm-8B model handler #598

Conversation

akshita-sukhlecha commented Aug 23, 2024

akshita-sukhlecha commented Aug 24, 2024

HuanzhiMao Aug 25, 2024

Choose a reason for hiding this comment

akshita-sukhlecha Aug 25, 2024

Choose a reason for hiding this comment

HuanzhiMao Aug 27, 2024

Choose a reason for hiding this comment

CharlieJCJ Aug 25, 2024 • edited Loading

Choose a reason for hiding this comment

akshita-sukhlecha Aug 25, 2024

Choose a reason for hiding this comment

CharlieJCJ Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

akshita-sukhlecha Aug 26, 2024

Choose a reason for hiding this comment

CharlieJCJ Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

Kishore-bluekyte Aug 27, 2024

Choose a reason for hiding this comment

CharlieJCJ Aug 25, 2024

Choose a reason for hiding this comment

ShishirPatil commented Oct 20, 2024

CharlieJCJ Aug 25, 2024 •

edited

Loading

CharlieJCJ Aug 26, 2024 •

edited

Loading

CharlieJCJ Aug 27, 2024 •

edited

Loading