QuiVer Benchmarks

QuiVer Benchmarks is a tool that helps you decide which OCR-D workflows are most suitable for your data. It executes preset workflows on different kinds of Ground Truth and evaluates the result. The results with the most recent version of ocrd_all can be viewed at https://ocr-d.de/quiver-frontend.

This repository holds everything needed to automatically execute different OCR-D workflows on images and evaluate the outcomes. It creates benchmarks for OCR-D data in a containerized environment. QuiVer Benchmarks currently runs in an automated workflow (CI/CD).

QuiVer Benchmarks is based on ocrd/all:maximum and has all OCR-D processors at hand that a workflow might use.

Requirements

Docker >= 23.0.0
Docker Compose plugin
make

To speed up QuiVer Benchmarks you can mount already downloaded text recognition models to /usr/local/share/ocrd-resources/ in docker-compose.yml by adding

- path/to/your/models:/usr/local/share/ocrd-resources/

to the volumes section. Otherwise, the tool will download all ocrd-tesserocr-recognize models as well as ocrd-calamari-recognize qurator-gt4histocr-1.0 on each run.

Usage (For Development)

clone this repository and switch to the cloned directory
build the image with make build
spin up a container with make start
run make prepare-default-gt
run make run
the benchmarks and the evaluation results will be available at data/workflows.json on your host system
when finished, run make stop to shut down and remove the Docker container you created previously

Benchmarks Considered

The relevant benchmarks gathered by QuiVer Benchmarks are defined in OCR-D's Quality Assurance specification and comprise

CER (per page and document wide), incl.
- median
- minimum and maximum CER
- standard deviation
WER (per page and document wide)
CPU time
wall time
processed pages per minute

Ground Truth Used

QuiVer Benchmarks currently uses the following Ground Truth:

https://github.com/tboenig/16_frak_simple
https://github.com/tboenig/17_frak_simple
https://github.com/tboenig/17_frak_complex
https://github.com/tboenig/18_frak_simple
https://github.com/tboenig/18_frak_complex
https://github.com/tboenig/19_frak_simple
https://github.com/tboenig/16_ant_simple
https://github.com/tboenig/16_ant_complex
https://github.com/tboenig/18_ant_simple
https://github.com/tboenig/19_ant_simple
https://github.com/tboenig/17_fontmix_simple
https://github.com/tboenig/18_fontmix_complex
Reichsanzeiger GT with many ads
Reichsanzeiger GT with many tables
Reichsanzeiger GT title pages only
Reichsanzeiger GT random selection of pages

A detailed list of images used for the Reichsanzeiger GT sets can be found in the data_src directory.

Adding New OCR-D Workflows (For Development)

Add new OCR-D workflows to the directory workflows/ocrd_workflows according to the following conventions:

OCR workflows have to end with _ocr.txt, evaluation workflows with _eval.txt. The files will be converted by OtoN to Nextflow files after the container has started.
workflows have to be TXT files
all workflows have to use ocrd process

You can then either rebuild the Docker image via docker compose build or mount the directory to the container via

- ./workflows/ocrd_workflows:/app/workflows/ocrd_workflows

in the volumes section and spin up a new run with docker compose up.

Removing OCR-D Workflows

Delete the respective TXT files from workflows/ocrd_workflows and either rebuild the image or mount the directory as volume as described above.

Outlook

enable users to use their own Ground Truth and workflows

License

See LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
data		data
data_srcs		data_srcs
scripts		scripts
src		src
tests		tests
workflows		workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuiVer Benchmarks

Requirements

Usage (For Development)

Benchmarks Considered

Ground Truth Used

Adding New OCR-D Workflows (For Development)

Removing OCR-D Workflows

Outlook

License

About

Releases

Packages

Contributors 3

Languages

License

OCR-D/quiver-benchmarks

Folders and files

Latest commit

History

Repository files navigation

QuiVer Benchmarks

Requirements

Usage (For Development)

Benchmarks Considered

Ground Truth Used

Adding New OCR-D Workflows (For Development)

Removing OCR-D Workflows

Outlook

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages