Diachronic view #35

paulpestov · 2023-04-24T09:06:50Z

(this is a copy of this internal issue for public purposes.

Describe the feature you'd like

Currently we display only the latest information about a workflow in QuiVer. We run a workflow A, the important metrics are saved and overwritten when we run workflow A again.

In order to measure how the changes in the OCR-D software impact the OCR quality as well as the hardware statistics we should introduce diachronic information to QuiVer, e.g. via a time stamp.

User story

As a developer I need an overview of how the changes in the software effect the OCR quality and hardware metrics in order to be certain that the newest contribution to OCR-D really improve the software's outcome.

Ideas we have discussed so far

How to display the information

For each GT corpus available there should be a line chart that depicts how a metric has changed over time. Each step in time (x axis) represents an ocrd_all or a ocrd_core release (clarify!)
Users can choose between the different metrics and can see a tendency whether the metric improves or not.

Underlying data structure

When selecting a GT corpus the front end uses an ID map file that points it to the right collection of JSON objects. Each OCR-D workflow that is executed on a GT corpus has a separate file in which all the runs per release are present.

Given GT workspace 16_ant_simple. We then have a file 16_ant_simple_minimal.json with all its benchmarking workflows, 16_ant_simple_selected_pages.json with all its benchmarking workflows etc. Each executed workflow has a timestamp by which the front end can then sort the single executions and retrieve the relevant data.

TODOs

clarify what our steps / increments in time are. A release of ocrd_all? A release of ocrd_core?
add time stamps to workflow objects
add single files for each GT workspace + workflow. ideally, the data should be sorted chronologically right from the start (although the front end should not depend on that)
create id map file

paulpestov · 2023-04-24T09:16:54Z

Here is our first draft:

A few notes:

The diachronic view can display charts about one metric at a time, which can be selected in the dropdown menu on the right
List (result) items represent foremost GT Corpora and contain multiple workflow setups
Each GT + workflow combination will run periodically and a chart visualisation will display the evaluation data
Each chart represents the metric value scale on the y-axis and time on the x-axis
Benchmarking runs will run upon each ocrd_all release
Each ocrd_all release will be marked in each chart
Each GT Corpus list item will first show an average chart that applies for all workflows that are setup for that GT Corpus

MareenGeestmann · 2023-04-28T12:26:27Z

Feedback from Open Tech call on 2023-04-26:

Maybe instead of the average it would be more useful to calculate and show the median and first and last quartil.
better to use minimum and maximum values instead of the average
display data page-wise instead of entire document

MareenGeestmann · 2023-05-04T08:23:06Z

To further refine the view, a few questions to assess what is really needed (and for me to understand):

Should the data of all previous releases be displayed? Or are for instance the last two releases or the last half year enough?
From the graphs you can read the progression well. Are concrete values of the metrics from the last release also interesting or is better or worse enough?
For a possible page-by-page display, especially character and word error rates are interesting (or also other metrics)?
In this context, how many pages do our current GTs cover?
Would a comparison of values suffice here or do you want to look directly into which characters or words are recognized how well?

In addition, a note from my side, axis labels and a legend or description (for color coding) need to be added.

bertsky · 2023-05-12T10:16:32Z

Maybe instead of the average it would be more useful to calculate and show the median and first and last quartil.

better to use minimum and maximum values instead of the average

display data page-wise instead of entire document

Average (mean or median) plus min and max (or first and last quartile, or better decentile or percentile) would be great, yes.

But page-wise display is probably too much to ask for. As long as its easily possible to navigate into the raw data by page to analyse regressions in detail (i.e. when some metric fell drastically, like a 20% worse minimum) that's enough IMO.

@mweidling and @paulpestov also discussed page-wise aggregation with me: currently the backend uses Dinglehopper, which is page-wise. So naive aggregation is macro-averaged ($\frac{1}{N} \sum\limits_i^N s_i$), but it's easy to point to the distinct pages which cause regressions. IMO since we also have the number of lines per page it should be easy to calculate a micro-averaged aggregate as well ($\frac{1}{ \sum\limits_i^N n_i} \sum\limits_i^N n_i s_i$) – where $n_i$ is the number of lines of page i and $s_i$ is the score of page i.

In brief: micro-averaged aggregate for average score besides per-page minimum score should be sufficient for quick diagnostics, the rest can always be dug up by navigating into the raw data. (And of course later one might implement some way to navigate into the per-page reports by Dinglehopper.)

Should the data of all previous releases be displayed? Or are for instance the last two releases or the last half year enough?

That's a difficult question. In terms of UI, IMO it would make sense to provide a slider to narrow it down dynamically. It should also be robust against gaps in the data (e.g. when some workflow or some metric was not available in earlier releases).

From the graphs you can read the progression well. Are concrete values of the metrics from the last release also interesting or is better or worse enough?

I don't understand. A graph will always show the concrete values, too. (Perhaps the exact numerical values can be highlighted via mouse-over?)

For a possible page-by-page display, especially character and word error rates are interesting (or also other metrics)?

See above (no page-by-page display, but minimum/maximum across pages).

Would a comparison of values suffice here or do you want to look directly into which characters or words are recognized how well?

See above (browsing into the individual per-page reports would be super cool, but is a lot of effort to implement; as long as one can manually look these up it should suffice)

In addition, a note from my side, axis labels and a legend or description (for color coding) need to be added.

Yes, labelled tics with release/date on x-axis, score on y-axis and a colour legend where multiple scores are combined into one chart.

MareenGeestmann · 2023-05-23T08:23:48Z

We had a talk with @bertsky, @paulpestov, and @mweidling about this. The next steps will be:

Adding the diachronic view to the dashboard (https://ocr-d.de/quiver-frontend/#/workflows) as third option to choose next to "list" and "table"
the icons next to the workflow title represent the processing steps; with a mouse over the used parameters will be displayed
graphs will be displayed next to the workflows showing the values for each release; a mouse over will display the exact value
axis get titles and units; an agenda for color coding will be added

In addition, in the table tab of the workflow dashboard a sorting feature will be added.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diachronic view #35

Diachronic view #35

paulpestov commented Apr 24, 2023 •

edited by mweidling

Loading

paulpestov commented Apr 24, 2023

MareenGeestmann commented Apr 28, 2023

MareenGeestmann commented May 4, 2023

bertsky commented May 12, 2023 •

edited

Loading

MareenGeestmann commented May 23, 2023 •

edited

Loading

Diachronic view #35

Diachronic view #35

Comments

paulpestov commented Apr 24, 2023 • edited by mweidling Loading

Describe the feature you'd like

User story

Ideas we have discussed so far

How to display the information

Underlying data structure

TODOs

paulpestov commented Apr 24, 2023

MareenGeestmann commented Apr 28, 2023

MareenGeestmann commented May 4, 2023

bertsky commented May 12, 2023 • edited Loading

MareenGeestmann commented May 23, 2023 • edited Loading

paulpestov commented Apr 24, 2023 •

edited by mweidling

Loading

bertsky commented May 12, 2023 •

edited

Loading

MareenGeestmann commented May 23, 2023 •

edited

Loading