Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diachronic view #35

Open
2 of 4 tasks
paulpestov opened this issue Apr 24, 2023 · 5 comments
Open
2 of 4 tasks

Diachronic view #35

paulpestov opened this issue Apr 24, 2023 · 5 comments

Comments

@paulpestov
Copy link
Contributor

paulpestov commented Apr 24, 2023

(this is a copy of this internal issue for public purposes.

Describe the feature you'd like

Currently we display only the latest information about a workflow in QuiVer. We run a workflow A, the important metrics are saved and overwritten when we run workflow A again.

In order to measure how the changes in the OCR-D software impact the OCR quality as well as the hardware statistics we should introduce diachronic information to QuiVer, e.g. via a time stamp.

User story

As a developer I need an overview of how the changes in the software effect the OCR quality and hardware metrics in order to be certain that the newest contribution to OCR-D really improve the software's outcome.

Ideas we have discussed so far

How to display the information

For each GT corpus available there should be a line chart that depicts how a metric has changed over time. Each step in time (x axis) represents an ocrd_all or a ocrd_core release (clarify!)
Users can choose between the different metrics and can see a tendency whether the metric improves or not.

Underlying data structure

When selecting a GT corpus the front end uses an ID map file that points it to the right collection of JSON objects. Each OCR-D workflow that is executed on a GT corpus has a separate file in which all the runs per release are present.

Given GT workspace 16_ant_simple. We then have a file 16_ant_simple_minimal.json with all its benchmarking workflows, 16_ant_simple_selected_pages.json with all its benchmarking workflows etc. Each executed workflow has a timestamp by which the front end can then sort the single executions and retrieve the relevant data.

TODOs

  • clarify what our steps / increments in time are. A release of ocrd_all? A release of ocrd_core?
  • add time stamps to workflow objects
  • add single files for each GT workspace + workflow. ideally, the data should be sorted chronologically right from the start (although the front end should not depend on that)
  • create id map file
@paulpestov
Copy link
Contributor Author

Here is our first draft:
Workflow Runs List@2x (1)

A few notes:

  • The diachronic view can display charts about one metric at a time, which can be selected in the dropdown menu on the right
  • List (result) items represent foremost GT Corpora and contain multiple workflow setups
  • Each GT + workflow combination will run periodically and a chart visualisation will display the evaluation data
  • Each chart represents the metric value scale on the y-axis and time on the x-axis
  • Benchmarking runs will run upon each ocrd_all release
  • Each ocrd_all release will be marked in each chart
  • Each GT Corpus list item will first show an average chart that applies for all workflows that are setup for that GT Corpus

@MareenGeestmann
Copy link

Feedback from Open Tech call on 2023-04-26:

  • Maybe instead of the average it would be more useful to calculate and show the median and first and last quartil.
  • better to use minimum and maximum values instead of the average
  • display data page-wise instead of entire document

@MareenGeestmann
Copy link

To further refine the view, a few questions to assess what is really needed (and for me to understand):

  • Should the data of all previous releases be displayed? Or are for instance the last two releases or the last half year enough?

  • From the graphs you can read the progression well. Are concrete values of the metrics from the last release also interesting or is better or worse enough?

  • For a possible page-by-page display, especially character and word error rates are interesting (or also other metrics)?

  • In this context, how many pages do our current GTs cover?

  • Would a comparison of values suffice here or do you want to look directly into which characters or words are recognized how well?

In addition, a note from my side, axis labels and a legend or description (for color coding) need to be added.

@bertsky
Copy link

bertsky commented May 12, 2023

  • Maybe instead of the average it would be more useful to calculate and show the median and first and last quartil.
  • better to use minimum and maximum values instead of the average
  • display data page-wise instead of entire document

Average (mean or median) plus min and max (or first and last quartile, or better decentile or percentile) would be great, yes.

But page-wise display is probably too much to ask for. As long as its easily possible to navigate into the raw data by page to analyse regressions in detail (i.e. when some metric fell drastically, like a 20% worse minimum) that's enough IMO.

@mweidling and @paulpestov also discussed page-wise aggregation with me: currently the backend uses Dinglehopper, which is page-wise. So naive aggregation is macro-averaged ($\frac{1}{N} \sum\limits_i^N s_i$), but it's easy to point to the distinct pages which cause regressions. IMO since we also have the number of lines per page it should be easy to calculate a micro-averaged aggregate as well ($\frac{1}{ \sum\limits_i^N n_i} \sum\limits_i^N n_i s_i$) – where $n_i$ is the number of lines of page i and $s_i$ is the score of page i.

In brief: micro-averaged aggregate for average score besides per-page minimum score should be sufficient for quick diagnostics, the rest can always be dug up by navigating into the raw data. (And of course later one might implement some way to navigate into the per-page reports by Dinglehopper.)

  • Should the data of all previous releases be displayed? Or are for instance the last two releases or the last half year enough?

That's a difficult question. In terms of UI, IMO it would make sense to provide a slider to narrow it down dynamically. It should also be robust against gaps in the data (e.g. when some workflow or some metric was not available in earlier releases).

  • From the graphs you can read the progression well. Are concrete values of the metrics from the last release also interesting or is better or worse enough?

I don't understand. A graph will always show the concrete values, too. (Perhaps the exact numerical values can be highlighted via mouse-over?)

  • For a possible page-by-page display, especially character and word error rates are interesting (or also other metrics)?

See above (no page-by-page display, but minimum/maximum across pages).

  • Would a comparison of values suffice here or do you want to look directly into which characters or words are recognized how well?

See above (browsing into the individual per-page reports would be super cool, but is a lot of effort to implement; as long as one can manually look these up it should suffice)

In addition, a note from my side, axis labels and a legend or description (for color coding) need to be added.

Yes, labelled tics with release/date on x-axis, score on y-axis and a colour legend where multiple scores are combined into one chart.

@MareenGeestmann
Copy link

MareenGeestmann commented May 23, 2023

We had a talk with @bertsky, @paulpestov, and @mweidling about this. The next steps will be:

  • Adding the diachronic view to the dashboard (https://ocr-d.de/quiver-frontend/#/workflows) as third option to choose next to "list" and "table"
  • the icons next to the workflow title represent the processing steps; with a mouse over the used parameters will be displayed
  • graphs will be displayed next to the workflows showing the values for each release; a mouse over will display the exact value
  • axis get titles and units; an agenda for color coding will be added

In addition, in the table tab of the workflow dashboard a sorting feature will be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants