Scene Text Recognition

Scene Text Recognition With Deep Learning Methods In Farsi.

Quick Links

Dependencies
Getting Started
Overview
Training
Samples
References
License

Dependencies

Install Dependencies $ pip install -r requirements.txt
Download Pretrained Weights Here

Getting Started

Fig. 1: Model architectur.

Project Structure

.
├── src
│   ├── nn
│   │   ├── feature_extractor.py
│   │   ├── layers.py
│   │   └── ocr_model.py
│   └── utils
│       ├── dataset.py
│       ├── labelConverter.py
│       ├── loss_calculator.py
│       ├── misc.py
│       ├── trainUtils.py
│       └── transforms.py
├── config.py
└── train.py

place dataset path in config.py file.

ds_path = {
    "train_ds" : "path/to/train/dataset",
    "test_ds" : "path/to/test/dataset",
}

DataSet Structure (each image must eventually contain a word)

.
├── Images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
│   ...
└── labels.json

labels.json Contents

{"img_1": "بالا", "img_2": "و", "img_3": "بدانند", "img_4": "چندین", "img_5": "به", ...}

Overview

Training

Objective Function

Denote the training dataset by $\ TD = \langle X_i , Y_i \rangle$ where $\ X_i$ is the training image and $\ Y_i$ is the word label. The training conducted by minimizing the objective function that negative log-likelihood of the conditional probability of word label.

$$O = -\sum_{(X_i, Y_i) \in TD} \log P(Y_i|X_i)$$

This function calculates a cost from an image and its word label, and the modules in the framework are trained end-to-end manner.

Fig. 1: Model Training History.

CTC Loss

CTC takes a sequence $\ H = h_1 , . . . , h_T$ , where $\ T$ is the sequence length, and outputs the probability of $\ \pi$, which is defined as

$$P(\pi|H) = \prod_{t = 1}^T y_{{\pi}_t}^t$$

where $\ y_{{\pi}_t}^t$ is the probability of generating character $\ \pi_t$ at each time step $\ t$.

Model	Input Size	Recall	Precision	F1	Params	Speed^(img/s)
$\ OCR-Base$	$\ 1$ $\ \times$ $\ 64$ $\ \times$ $\ 192$	$\ 0.993$	$\ 0.997$	$\ 0.997$	$\ 35,023,143$	$\ 89.24$

Samples

References

🛡️ License

Project is distributed under MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
figures		figures
src		src
LICENSE		LICENSE
README.md		README.md
config.py		config.py
prediction.py		prediction.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scene Text Recognition

Quick Links

Dependencies

Getting Started

Overview

Training

Objective Function

CTC Loss

Samples

References

🛡️ License

About

Releases

Packages

Contributors 2

Languages

License

Saeed-Biabani/Scene-Text-Recognition

Folders and files

Latest commit

History

Repository files navigation

Scene Text Recognition

Quick Links

Dependencies

Getting Started

Overview

Training

Objective Function

CTC Loss

Samples

References

🛡️ License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages