GitHub - sepehrraisi/Persian-OCR: A project to bring high accuracy OCR to Persian language.

Persian-OCR

A project to bring high accuracy OCR to Persian language!
Explore the docs »

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap

About The Project

As I was looking for a good Persian OCR, I've found out that there is no good open-source project that features Persian language for OCR. So I've started a project to create a simple Persian OCR to achieve the missing.

What I have Done:

Optimize pytesseract for persian by testing different configs.
Image Optimization for low-res images to improve accuracy significantly.
Using a Persian Spell-Checking to improve accuracy.

Of course, This project isn't perfect and i'm still working on it to improve accuracy and speed. But I hope this project helps other people like me to have a good base for Persian OCR.

(back to top)

Built With

I have used python to build this project. Two of the most useful modules in this project were pytesseract and opencv.

Getting Started

This is a simple instruction to start using this project.

Prerequisites

You need to install pytesseract on your device:

Ubuntu
```
 sudo apt-get install tesseract-ocr
```

You need to add Persian Language to tesseract:

Ubuntu
```
 sudo apt-get install tesseract-ocr-fas
```

Installation

Now that you've installed tesseract we can move on with Persian-OCR:_

Clone the repo

git clone https://github.com/sepehrraisi/Persian-OCR && \
cd Persian-OCR

Create a Virtual Environment for python and Source it:

python3 -m venv venv && \
source ./venv/bin/activate

Install Python modules requirements.txt
```
pip install -r requirements.txt
```

(back to top)

Usage

After installing the requirements you can use it by running the ocr.py file:

python ./ocr.py -i <inputfile> -o <outputfile>

Then it will write the results to outputfile

(back to top)

Roadmap

Use pytesseract to extract text
Improve accuracy by simple opencv features
Improve accuracy by UpScaling the images
Add post-processing modules to improve accuracy
Add modular capabilities to improve functionality
Add Table recognition
Multi-language Support
- Persian
- English

See the open issues for a full list of proposed features (and known issues).

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Dictionary		Dictionary
Inputs		Inputs
OCR		OCR
README/images		README/images
.gitignore		.gitignore
COPYING		COPYING
README.md		README.md
ocr.py		ocr.py
requirements.txt		requirements.txt
textcleaner		textcleaner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian-OCR

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Roadmap

About

Languages

License

sepehrraisi/Persian-OCR

Folders and files

Latest commit

History

Repository files navigation

Persian-OCR

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Roadmap

About

Topics

Resources

License

Stars

Watchers

Forks

Languages