Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 1.42 KB

README.md

File metadata and controls

51 lines (35 loc) · 1.42 KB

Arabic Dates OCR

This repo contains the code for Optical Character Recognition (OCR) for dates written in Arabic numbers. The code is written using Tensorflow.

Note: The ArabicDatesVocabulary class creates a hash table for mapping arabic characters to labels. For some reason my Windows device did not encode arabic characters despite working fine on Google Colab. So, I guess this is a device specific problem.

Setup

  1. Create virtual environment
$ python -m venv .venv
# Linux
$ source .venv/bin/activate
# Windows
$ .venv\Scripts\activate
  1. Install dependencies
$ pip install -r requirements.txt

Training the Model

A training script is provided for training on your own data. Additionally, you can provide your own checkpoints.

$ python ocrnet/train.py --model=./models/model --train_dataset=./dataset/ --output_path=./ocr_model/

Inferencing

There is a script provided for inference on your own data. Additionally, you can provide your own checkpoints.

$ python ocrnet/inference.py --model=./models/model --test_dataset=./dataset/

Model Conversion

You can convert the model to ONNX using the command below.

$ python -m tf2onnx.convert --saved-model ./models/model --output ./models/model.onnx

To convert to TensorRT, install tensorrt then, run the command below.

$ python ocrnet/converter.py --input_name=./models/model --ouptut=./models/trt_model