Replies: 1 comment 13 replies
-
Hi @ep0p 👋, Yeah that's correct. In the meanwhile you can check out the dataset i used to fine tune the multilingual model: synth_multilingual_dataset |
Beta Was this translation helpful? Give feedback.
13 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I have fine-tuned a Doctr model for text recognition, which you can find here (model). During training and evaluation, I achieved very high validation scores, with near-perfect metrics, as shown below:
To further validate the model's performance, I generated a PDF file containing words solely from the validation dataset. However, when I run inference on this PDF, the model's performance is significantly worse than expected. Based on the high validation scores, I anticipated full recognition of the words, but this was not the case.
For example:
1. Single word case:
Here is the content of my PDF with the detected word:
And here is the model's prediction:
2. Multiple words case:
These detections are the most accurate I could achieve by setting the
bin_thresh = 0.1
. Higher values resulted in worse predictions.I also noticed the model adds extra punctuation marks. Initially, I thought this was due to overlapping detection boxes, but the issue persisted even when testing with a PDF containing one word per line.
Am I wrong to expect 99% accuracy during inference on the validation dataset, given the near-perfect validation scores achieved during training?
Beta Was this translation helpful? Give feedback.
All reactions