-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BoW metric implementation #10
Comments
Thanks for your suggestions! Regarding your first remark on BoW, I just did some more tests and found that you're right. Further, the calculation for this metric will be completely redone to fit what's been written in the current OCR-D evaluation specification using a multiset rather than the current Counter-set under the hood. |
I do think your |
@bertsky What's the purpose of the wikipage for evaluation? |
they have some references at the end, plus my suggestions in OCR-D/spec#240
not really I found, which surprised me too. See discussion on my reviews of the ocrd_eval spec.
Not sure where this fits within the ocrd-website and spec. But it states quite clearly…
Indeed. But notice So edit along! |
@bertsky |
Re-open, since it's up to @bertsky to close or not. |
Sorry, meanwhile I forgot about this. Will revisit and give my two cents. |
The numerator of the metrics called
BoWs
andBagOfWords
only counts the false negatives:digital-eval/src/digital_eval/metrics.py
Lines 442 to 451 in edc8b97
This is then combined with a denominator that counts the length of the reference:
digital-eval/src/digital_eval/metrics.py
Lines 167 to 195 in 971918f
Together, this yields a pure recall rate calculation.
But for recall there is already an equivalent calculation via NLTK's metrics. So I guess this should really be a calculation for BoW accuracy, and therefore can be considered a bug. To get the correct numerator for accuracy/error, just add the inverse diff, i.e. the counts of the false positives.
Also, the function names
accuracy_for
anderror_for
are misleading: Not only are these unnormaled rates, artificially clipped to the [0,1] interval. But more importantly, they should use the sum of both lengths (reference and candidate) as denominator.The text was updated successfully, but these errors were encountered: