[LREC-COLING 2024] PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents. Boost OCR Performance on Scientific Documents.
-
Updated
May 23, 2024 - Python
[LREC-COLING 2024] PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents. Boost OCR Performance on Scientific Documents.
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
Dataset used in autogoal for the example with pytesseract.
Türkçe Haberlerin kategorize edilmesi ve Nlp kütüphanelerinin geliştirilmesi
Add a description, image, and links to the ocr-dataset topic page so that developers can more easily learn about it.
To associate your repository with the ocr-dataset topic, visit your repo's landing page and select "manage topics."