Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental result on APPS #4

Open
sh0416 opened this issue Sep 25, 2024 · 0 comments
Open

Experimental result on APPS #4

sh0416 opened this issue Sep 25, 2024 · 0 comments

Comments

@sh0416
Copy link

sh0416 commented Sep 25, 2024

Hello.

I am reproducing your results, but I have a trouble time reproducing your baseline named deepseek-coder-6.7b-instruct.

I use your prompt provided in this repository, but the APPS introductory pass@1 is just 0.3192, where 0.4465 is right value according to your paper.

Also, I observe some difference between your paper and the paperwithcode website. The former reports 0.5001 while the latter reports 0.3380.

After this report, I think my score could be right if the base model score (i.e.,deepseek-coder-6.7b-instruct) is 0.3192 and the finetuned model score (i.e.motcoder) is 0.3380.

While writing this post, I found that website indicate your model as motcoder-15b which should be changed to motcoder-6.7b as your paper said that your base model is deepseek-coder-6.7b-instruct, right?

Could you clarify which score is right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant