Experimental result on APPS #4

sh0416 · 2024-09-25T07:30:13Z

Hello.

I am reproducing your results, but I have a trouble time reproducing your baseline named deepseek-coder-6.7b-instruct.

I use your prompt provided in this repository, but the APPS introductory pass@1 is just 0.3192, where 0.4465 is right value according to your paper.

Also, I observe some difference between your paper and the paperwithcode website. The former reports 0.5001 while the latter reports 0.3380.

After this report, I think my score could be right if the base model score (i.e.,deepseek-coder-6.7b-instruct) is 0.3192 and the finetuned model score (i.e.motcoder) is 0.3380.

While writing this post, I found that website indicate your model as motcoder-15b which should be changed to motcoder-6.7b as your paper said that your base model is deepseek-coder-6.7b-instruct, right?

Could you clarify which score is right?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental result on APPS #4

Experimental result on APPS #4

sh0416 commented Sep 25, 2024

Experimental result on APPS #4

Experimental result on APPS #4

Comments

sh0416 commented Sep 25, 2024