OCR4All Pixel Classifier

Requirements

Python dependencies are specified in requirements.txt / setup.py.

You must install the package via pip with either ocr4all_pixel_classifier[tf_cpu] to use CPU version of tensorflow or ocr4all_pixel_classifier[tf_gpu] to use GPU (CUDA) version of tensorflow. For the latter, your system should be set up with CUDA 9 and CuDNN 7.

Usage

Pixel classifier

Classification

To run a model on some input images, use ocr4all-pixel-classifier predict:

ocr4all-pixel-classifier predict --load PATH_TO_MODEL \
	--output OUTPUT_PATH \
	--binary PATH_TO_BINARY_IMAGES \
	--images PATH_TO_SOURCE_IMAGES \
	--norm PATH_TO_NORMALIZATIONS

(ocr4all-pixel-classifier is an alias for ocr4all-pixel-classifier predict)

This will create three folders at the output path:

color: the classification as color image, with pixel color corresponding to the class for that pixel
inverted: inverted binary image with classification of foreground pixels only (i.e. background is black, foreground is white or class color)
overlay: classification image layered transparently over the original image

Training

For training, you first have to create dataset files. A dataset file is a JSON file containing three arrays, for train, test and evaluation data (also called train/validation/test in other publications). The JSON file uses the following format:

{
	"train": [
		//datasets here
	],
	"test": [
		//datasets here
	],
	"eval": [
		//datasets here
	]
}

A dataset describes a single input image and consists of several paths: the original image, a binarized version and the mask (pixel color corresponds to class). Furthermore, the line height of the page in pixels must be specified:

{
	"binary_path": "/path/to/image/binary/filename.bin.png",
	"image_path":  "/path/to/image/color/filename.jpg",
	"mask_path":  "/path/to/image/mask/filename_MASK.png",
	"line_height_px": 18
}

The generation of dataset files can be automated using ocr4all-pixel-classifier create-dataset-file. Refer to the command's --help output for further information.

To start the training:

ocr4all-pixel-classifier train \
    --train DATASET_FILE.json --test DATASET_FILE.json --eval DATASET_FILE.json \
    --output MODEL_TARGET_PATH \
    --n_iter 5000

The parameters --train, --test and --eval may be followed by any number of dataset files or patterns (shell globbing).

Refer to ocr4all-pixel-classifier train --help for further parameters provided to affect the training procedure.

You can combine several dataset files into a split file. The format of the split file is:

{
	"label": "name of split",
	"train": [
		"/path/to/dataset1.json",
		"/path/to/dataset2.json",
		...
	],
	"test": [
		//dataset paths here
	],
	"eval": [
		//dataset paths here
	]
}

To use a split file, add the --split_file parameter.