-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Fine-tuning Mistral 7B Model - Results Discrepancy #29
Comments
Same problem for me! Do you solve it? I also can not reproduce the results whatever the Mistral or LLaMA |
I encountered the same issue when trying to train Mistral-7B on MetaMathQA. My environment is:
I only got a 69% accuracy on GSM8K and 24% on MATH after 3 epochs with LR 5e-6 and global batch 128. Although due to the limitation of my computational resources, I added gradient checkpointing and flash attention to the original code, and also changed the per_device_batch_size to 1 (so gradient accumulates for 16 steps on 8 GPUs), but I don't think these modifications will bring significant differences to the performance. |
My result with llama-factory and hyperparameters reported in the paper is 72.2%on GSM8K, I do not under stand why there are so many failures when trying to reproduce the result. |
I just can't get such a higher score, and I don't think this is an unique case |
Hello,
I attempt to replicate the experiment using metamathQA dataset to finetune mistral-7b, but the results I obtained do not match the ones shared in the repository.
Reproduction steps
I used the following parameters in
run_mistral.sh
.and I get
gsm8k acc==== 0.6618650492797574
math acc==== 0.2274
which is different from the reported 77.7 and 28.2
Environment details
Here are the detailed of my Python environment:
I would appreciate any guidance or suggestions you could provide to help resolve this discrepancy. Thank you in advance for your time and assistance.
Best regards,
lyf-00
The text was updated successfully, but these errors were encountered: