Skip to content

Guillaume-Fgt/Pytesseract-streamlit-interface

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytesseract-streamlit-interface

Webapp to retrieve text from images using OpenCV and Pytesseract. The interface is made with Streamlit.

How to install it:

  • clone this repo
  • edit the pytesseract_streamlit/config.py file with the path to your tesseract executable (tesseract_exec_path)
  • create a venv and activate it
  • execute this command inside the folder of the pyproject.toml file:
pip install .
  • with the venv activated, run in the CLI:
python pytesseract_streamlit

How to use it

The steps to obtain text are as followed:

  • load an image using button on the side bar
  • Using OpenCV, the image is processed in order to define ROI: Region Of Interest. This is the parts of the image that will be send to Pytesseract for text detection. They will appear in green with a number as overlay. Tweaking the settings will change their number and shape.
  • You can change Pytesseract page segmentation mode and language to possibly improve text detection relevance. On the left column, you will see text extracted with corresponding ROI number and on the right the cropped image of the ROI. You have a button on top of each column to save text and images as file. By default, it is saved in the "result" directory of the project.

Animation

About

Webapp to retrieve text from image using OpenCV and Pytesseract

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages