Document-OCR

Japanese OCR utlizing python inorder to read and export text and data

Usage

You can use this project by cloning this reposetory and running it with your IDE of choice.

You will need to install the following components inorder to run the code;

I recommend following this tutorial: ひつじ

Change the Engine for tesseract to the Best version over the Fast version

※日本語Best版は下から落とす

※これをTesseractーOCR＞＞tessdataの中身と上書きする

pip install pillow
pip install pyocr
pip install

For the PDF to Image conversion you will need the library Poppler

Download Latest Version of Poppler Here

Instructions for PATH here

Important

This is a prototype at best, do not expect everything to work perfectly.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
OCR.py		OCR.py
PDF2IMG_GUI-p.py		PDF2IMG_GUI-p.py
PDF2IMG_GUI.py		PDF2IMG_GUI.py
PDF2IMG_OG.py		PDF2IMG_OG.py
README.md		README.md
Text_combined.py		Text_combined.py