-
Notifications
You must be signed in to change notification settings - Fork 3
Software
Stefan Weil edited this page Jul 29, 2022
·
36 revisions
- ABBYY FineReader Engine is an OCR SDK that gives developers, integrators and BPOs the tools they require to integrate optical text recognition technologies into their applications.
- docWizz is a software solution to digitize and convert library holdings and archives for easy access, searchability, and long-term preservation. docWorks generates as default METS/ALTO output, in addition it offers the transformation of the output into further formats like ePUB, PDF, plain-text, RTF or others.
- kraken is a free turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving (mostly) functional equivalence.
- eScriptorium is a free web application for manual and automated text segmentation and recognition which can import and export ALTO.
- Tesseract is a widely used free OCR software.
- OCLC CONTENTdm makes everything in your digital collections available to everyone, everywhere. No matter the format — local history archives, newspapers, books, maps, slide libraries or audio/video — CONTENTdm can handle the storage, management and delivery of your collections to users across the Web.
- Veridian is presentation software that makes it easy to search, view, and interact with digital collections on the Internet. Veridian supports almost any type of content such as books, magazines, journals, newspapers, photographs, maps, and audio/video files and makes them easily accessible to anyone online.
- Islandora is a popular open source digital repository system based on Fedora Commons, Drupal and a host of additional applications. Islandora is used for many different types of content, including newspapers.
-
BNLViewer
METS/ALTO viewer written in Java and Javascript from National Library of Luxembourg -
https://github.com/tokee/quack
An enhanced ALTO-viewer for Quality Assurance oriented display of a collections of scans, typically from books or newspapers. -
http://dfg-viewer.de/en/the-project/
Browser web service for displaying digital representations from decentralized library repositories - dinglehopper is a free OCR evaluation tool and reads ALTO, PAGE and text files.
- Aletheia (an advanced document analysis system) as well as other commercial and/or open source PRImA tools such as OCR text and layout performance evaluation, viewers, and converters support ALTO as input format.
-
https://github.com/KBNLresearch/alto-editor
Browser based post-correction tool for Alto XML files, version 1 -
https://github.com/renevanderark/altoedit-2.0
Browser based post-correction tool for Alto XML files, version 2 -
https://github.com/cneud/alto-tools
Python script for various operations on ALTO files -
https://github.com/KBNLresearch/europeananp-ner
Named Entity Recognition based on Stanford Named Entity Recognizer with support for ALTO -
https://github.com/impactcentre/ocrevalUAtion
Evaluation of OCR and a reference text (multiple formats supported, incl. ALTO) - Jochre Alto Editor is a browser based post-correction tool for Alto XML files, version 4, and editor for the construction of OCR training corpora.
-
https://github.com/UB-Mannheim/ocr-transform
Convert between Tesseract hOCR and ALTO XML 2.0/2.1 using XSL stylesheets -
https://github.com/ironymark/AbbyyToAlto
This is a simple Converter written in PHP5 to convert Abbyy FineReader XML into the ALTO XML document format. -
https://github.com/Mewel/abbyy-to-alto
A simple Java based tool to convert Abbyy FineReader XML to ALTO XML. -
https://github.com/INL/OpenConvert
OCR/Text format conversion tool, supports ALTO as input format to create TEI, Folia -
https://github.com/altomator/ALTO-HTML
ALTO to HTML batch converter dealing with the ALTO tags feature (tags were introduced in ALTO v2). Based on XSLT and DOS scripts. -
https://github.com/filak/hOCR-to-ALTO
XSL stylesheets to convert from Tesseract hOCR output to ALTO 2.0/2.1 format -
https://github.com/glenrobson/iiif_stuff/tree/master/alto2annotations
This XSLT converts an ALTO xml document to an annotation list for use with a IIIF manifest. -
https://github.com/kba/page-to-alto
Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2) -
https://github.com/OurDigitalWorld/tessglyph
Program that uses Tesseract API to produce ALTO XML with Glyph variants.
-
https://github.com/edsu/alto-words
This is a simplistic demonstration of how you can calculate the ratio of dictionary words to all words in a METS Alto OCR xml file -
https://github.com/tokee/alto-ocr-cleanup
Experiments with cleanup of dirty ALTO OCR files using anagram hashing. -
https://github.com/altomator/EN-data_mining
METS/ALTO data mining tool: Extraction of quantitative metadata from METS/ALTO newspapers documents. Based on XSLT or Perl scripts. See also http://altomator.github.io/EN-data_mining