Skip to content

Preprocessing methods to enhance Tesseract-OCR in the case of printed text on difficult background, or handwritten text on lined/squared paper.

License

Notifications You must be signed in to change notification settings

jo-valer/tesseract-ocr-enhanced

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enhancing Tesseract OCR

This is the repository for our Signal, Image and Video course Project (Giovanni Valer and Laurence Bonat).

The Report is available here.

Installation

We used Python 3.12.0 and Tesseract-OCR 5.3.3. See requirements.txt for the required packages.

Methods

The methods folder contains the different experiments of our project. There are different functionalities:

  • manual_trackbar.py: trackbar in manual mode
  • autonomous_trackbar.ipynb: trackbar in autonomous mode
  • automatic_filtering.ipynb: automatic filtering of the text through a specific pipeline
  • lines_detection.ipynb: automatically detect if a text is on lined/squared paper
  • squared_paper_ocr.ipynb: HTR on lined/squared paper

Results

In results are the results of all methods. There is the compute_metrics.py script which automatically computes and saves the average accuracy of each method in results/results.txt, (plus some other metrics in results/metrics).

About

Preprocessing methods to enhance Tesseract-OCR in the case of printed text on difficult background, or handwritten text on lined/squared paper.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published