Optical Character Recognition

AutoRegressive Transformer Model for Optical Character Recognition In Farasi

Dependencies

Install Dependencies $ pip install -r requirements.txt
Download Pretrained Weights Here

Getting Started

Project Structure

.
├── src
│   ├── nn
│   │   ├── decoder.py
│   │   ├── encoder.py
│   │   ├── __init__.py
│   ├── dataset.py
│   ├── misc.py
│   ├── schedule.py
│   ├── tokenizer.py
│   ├── tracker.py
│   ├── trainutils.py
│   ├── trabsforms.py
│   └── vocab.py
├── build.py
├── config.py
├── inference.py
└── main.py

Architecture

Fig. 1. Proposed Model Architecture

Modules

ViT as Image Encoder: The bare ViT Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Positional Encoding: Since our model contains no recurrence and no convolution, in order for the model to make use of the order of the sequence, we must inject some information about the relative or absolute position of the tokens in the sequence. In this work, we use sine and cosine functions of different frequencies:

$\ PE_{(pos, 2i)} = \sin({\frac{pos}{10000^\frac{2i}{d_{model}}}})$

$\ PE_{(pos, 2i+1)} = \cos({\frac{pos}{10000^\frac{2i}{d_{model}}}})$

Multi-Head Attention: Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. Intuitively, multipleattention heads allows for attending to parts of the sequence differently (e.g. longer-term dependencies versus shorter-term dependencies).

$\ MultiHead(Q, K, V) = Concat(head_1, ..., head_h)W^0$

where $\ head_i = Attention(Q{W_i}^Q, K{W_i}^K, V{W_i}^V)$

$\ Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$

above $\ W$ are all learnable parameter matrices.

Dataset

We Use Persian-OCR-230k DataSet For Train Our Model That You Can Find It Here

Training

Prediction

from src.tokenizer import loadTokenizer
from src.nn import TRnet

# load pretrained model
model, metadata = TRnet.from_pretrained('path/to/model.pth')

# load tokenizer
tokenizer = loadTokenizer('path/to/vocab.pkl')

python3 inference.py path/to/image/file.png

Image	Actual Text	Prediction	WER	CER
	همه چی خوب فقط سس خیلی کم گذاشته بودن	همه چی خوب فقط سس خیلی کم گذاشته بودن	0.0	0.0
	عالی بود فقط یه کم سوسیس تند بود	عالی بود فقط یه کم سوسیس تند بود	0.0	0.0
	پیتزا سرد سرد بود اصلا خوشمزه نبود	پیتزا سرد سرد بود واقعا خوشمزه نبود	0.14	0.05
	ممنون، خسته نباشید خیلی سریع و به موقع و کامل بود	ممنون، خسته نباشید خیلی سریع و به موقع و کامل بود	0.0	0.0

🛡️ License

Project is distributed under MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optical Character Recognition

Quick Links

Dependencies

Getting Started

Architecture

Modules

Dataset

Training

Prediction

🛡️ License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
src		src
LICENSE		LICENSE
README.md		README.md
build.py		build.py
config.py		config.py
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt

License

Saeed-Biabani/Optical-Character-Recognition

Folders and files

Latest commit

History

Repository files navigation

Optical Character Recognition

Quick Links

Dependencies

Getting Started

Architecture

Modules

Dataset

Training

Prediction

🛡️ License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages