BoW metric implementation #10

bertsky · 2023-03-01T12:55:49Z

The numerator of the metrics called BoWs and BagOfWords only counts the false negatives:

digital-eval/src/digital_eval/metrics.py

Lines 442 to 451 in edc8b97

    
           def bag_of_tokens(reference_tokens: List[str], candidate_tokens: List[str]) -> int: 
        
               """Calculate intersection/difference 
        
               between reference and candidate token list 
        
               """ 
        
               return len(_diff(reference_tokens, candidate_tokens)) 
        
           def _diff(gt_tokens, cd_tokens) -> List[str]: 
        
               return list((Counter(gt_tokens) - Counter(cd_tokens)).elements())

This is then combined with a denominator that counts the length of the reference:

digital-eval/src/digital_eval/metrics.py

Lines 167 to 195 in 971918f

    
           def accuracy_for(the_obj) -> float: 
        
               """Calculate accuracy as ratio of 
        
               correct items, with correct items 
        
               being expected items minus  
        
               number of differences. 
        
               Respect following corner cases: 
        
               * if less correct items than differences => 0 
        
               * if both correct items and differences eq zero => 1 
        
                 means: nothing to find and it did detect nothing  
        
                 (i.e. no false-positives)  
        
               Args: 
        
                   the_obj (object): object containing information  
        
                   about reference data and difference 
        
               Returns: 
        
                   float: accuracy in range 0.0 - 1.0 
        
               """ 
        
               _inspect_calculation_object(the_obj) 
        
               diffs = the_obj.diff 
        
               n_refs = len(the_obj._data_reference) 
        
               if (n_refs - diffs) < 0: 
        
                   return 0 
        
               if n_refs == 0 and diffs == 0: 
        
                   return 1.0 
        
               elif n_refs > 0: 
        
                   return (n_refs - diffs) / n_refs

Together, this yields a pure recall rate calculation.

But for recall there is already an equivalent calculation via NLTK's metrics. So I guess this should really be a calculation for BoW accuracy, and therefore can be considered a bug. To get the correct numerator for accuracy/error, just add the inverse diff, i.e. the counts of the false positives.

Also, the function names accuracy_for and error_for are misleading: Not only are these unnormaled rates, artificially clipped to the [0,1] interval. But more importantly, they should use the sum of both lengths (reference and candidate) as denominator.

The text was updated successfully, but these errors were encountered:

M3ssman · 2023-03-14T19:34:31Z

Thanks for your suggestions!

Regarding your first remark on BoW, I just did some more tests and found that you're right.
If the candidate is not just blank but contains additional words, false positives, they are not considered.
This is not intended and therefore indeed buggy.

Further, the calculation for this metric will be completely redone to fit what's been written in the current OCR-D evaluation specification using a multiset rather than the current Counter-set under the hood.

bertsky · 2023-03-14T20:06:34Z

Further, the calculation for this metric will be completely redone to fit what's been written in the current OCR-D evaluation specification using a multiset rather than the current Counter-set under the hood.

I do think your Counter set method is correct. It perfectly reflects the OCR-D eval spec (which itself based on the count-based success measure by PRImA, as opposed to index-based, which you often see in information retrieval contexts).

M3ssman · 2023-03-15T10:12:31Z

@bertsky
Where do the formulas in the OCR-D eval spec originate from?
Is a reference implementation for the BoW Error public available?

What's the purpose of the wikipage for evaluation?
Looks like this eval tool is not listed there 🙂

bertsky · 2023-03-21T16:54:21Z

Where do the formulas in the OCR-D eval spec originate from?

they have some references at the end, plus my suggestions in OCR-D/spec#240

Is a reference implementation for the BoW Error public available?

not really I found, which surprised me too. See discussion on my reviews of the ocrd_eval spec.

What's the purpose of the wikipage for evaluation?

Not sure where this fits within the ocrd-website and spec. But it states quite clearly…

Problem statement

Which data and tools can we use to objectively measure quality and compare results of both complete workflows and individual steps (beyond final text Character Error Rate) on a non-representative sample?

Looks like this eval tool is not listed there

Indeed. But notice edited this page on Aug 20, 2020.

So edit along!

einspunktnull · 2023-05-25T12:16:25Z

@bertsky
I changed the BoW metric according to ur request. See the latest commit in issue branch https://github.com/ulb-sachsen-anhalt/digital-eval/tree/bow_metric_impl_%2310.
Take a look at the tests in tests/test_ocr_metrics.py where I used the example values from BoW Error Rate (OCR-D eval spec).
Please review and give me a sign of what u think.

M3ssman · 2023-06-07T14:08:46Z

Re-open, since it's up to @bertsky to close or not.

bertsky · 2023-06-07T14:37:52Z

Sorry, meanwhile I forgot about this. Will revisit and give my two cents.

M3ssman added the bug Something isn't working label Mar 14, 2023

M3ssman self-assigned this Mar 14, 2023

einspunktnull self-assigned this May 24, 2023

M3ssman closed this as completed Jun 7, 2023

M3ssman reopened this Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BoW metric implementation #10

BoW metric implementation #10

bertsky commented Mar 1, 2023

M3ssman commented Mar 14, 2023

bertsky commented Mar 14, 2023

M3ssman commented Mar 15, 2023 •

edited

Loading

bertsky commented Mar 21, 2023

Problem statement

einspunktnull commented May 25, 2023

M3ssman commented Jun 7, 2023

bertsky commented Jun 7, 2023

BoW metric implementation #10

BoW metric implementation #10

Comments

bertsky commented Mar 1, 2023

M3ssman commented Mar 14, 2023

bertsky commented Mar 14, 2023

M3ssman commented Mar 15, 2023 • edited Loading

bertsky commented Mar 21, 2023

Problem statement

einspunktnull commented May 25, 2023

M3ssman commented Jun 7, 2023

bertsky commented Jun 7, 2023

M3ssman commented Mar 15, 2023 •

edited

Loading