Post-process PageXMLs to better the reading order of regions
-
Updated
Sep 25, 2024 - Python
Post-process PageXMLs to better the reading order of regions
Command line tool for Kraken text segmentation and recognition.
A template for creating a ground truth repo with the various functions and features: such as metadata creation, data analysis and presentation.
Toolset for Tesseract training with PageXML Ground-Truth
Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format
Some bits of javascript to transcribe scanned pages using PageXML
Library in C++ and a python wrapper for dealing with Page XML files
Simple app for visual editing of Page XML files
This repo provides a collection of ground truth data. The collection was compiled under different aspects (complexity of the layouts and use of the fonts). The individual data are also characterized by metadata. The metadata is based on the labeling scheme of OCR-D/PrimaLab.
This module provides access to Transkribus PageXML files via Xquery functions. It is designed to be used in context of a Basex xml database, but should work with other xml databases as well.
LECTAUREP Pipeline demonstration to TEI Publisher
Add a description, image, and links to the pagexml topic page so that developers can more easily learn about it.
To associate your repository with the pagexml topic, visit your repo's landing page and select "manage topics."