Replies: 1 comment 3 replies
-
There are many reasons this could be the case, but require a lot more details about your setup. Once simple think to check is that you're finetuning with dropout as this is often quite important. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What is the training_eval-accuracy that is reported in the event logs?
Here is what I am seeing during training on tweet classification task:
The training basically completes after 1000 steps. The reported "training_eval"-accuracy (on a 3-way classification task) is reported around 0.84 (which would have been an amazing score since RoBERTa Large is doing around 0.74 on the same set).
However doing a "real evaluation" on the same dataset (performed after saving the checkpoints in the training script), reveals a totally different situation:
The score of of around 0.68 is not very impressive. I have also looked through the stored predictions here, and calculated accuracy/f1 manually, and can confirm that this metric is correct.
I am trying to figure out why it is doing so bad on this task, and wanted to understand what the training_eval-accuracy really is reporting.
Beta Was this translation helpful? Give feedback.
All reactions