Skip to content

OMMR4all/ommr4all-page-segmentation

Repository files navigation

OCR4All Pixel Classifier

Requirements

Python dependencies are specified in requirements.txt / setup.py.

You must install the package via pip with either ocr4all_pixel_classifier[tf_cpu] to use CPU version of tensorflow or ocr4all_pixel_classifier[tf_gpu] to use GPU (CUDA) version of tensorflow. For the latter, your system should be set up with CUDA 9 and CuDNN 7.

Usage

Pixel classifier

Classification

To run a model on some input images, use ocr4all-pixel-classifier predict:

ocr4all-pixel-classifier predict --load PATH_TO_MODEL \
	--output OUTPUT_PATH \
	--binary PATH_TO_BINARY_IMAGES \
	--images PATH_TO_SOURCE_IMAGES \
	--norm PATH_TO_NORMALIZATIONS

(ocr4all-pixel-classifier is an alias for ocr4all-pixel-classifier predict)

This will create three folders at the output path:

  • color: the classification as color image, with pixel color corresponding to the class for that pixel
  • inverted: inverted binary image with classification of foreground pixels only (i.e. background is black, foreground is white or class color)
  • overlay: classification image layered transparently over the original image

Training

For training, you first have to create dataset files. A dataset file is a JSON file containing three arrays, for train, test and evaluation data (also called train/validation/test in other publications). The JSON file uses the following format:

{
	"train": [
		//datasets here
	],
	"test": [
		//datasets here
	],
	"eval": [
		//datasets here
	]
}

A dataset describes a single input image and consists of several paths: the original image, a binarized version and the mask (pixel color corresponds to class). Furthermore, the line height of the page in pixels must be specified:

{
	"binary_path": "/path/to/image/binary/filename.bin.png",
	"image_path":  "/path/to/image/color/filename.jpg",
	"mask_path":  "/path/to/image/mask/filename_MASK.png",
	"line_height_px": 18
}

The generation of dataset files can be automated using ocr4all-pixel-classifier create-dataset-file. Refer to the command's --help output for further information.

To start the training:

ocr4all-pixel-classifier train \
    --train DATASET_FILE.json --test DATASET_FILE.json --eval DATASET_FILE.json \
    --output MODEL_TARGET_PATH \
    --n_iter 5000

The parameters --train, --test and --eval may be followed by any number of dataset files or patterns (shell globbing).

Refer to ocr4all-pixel-classifier train --help for further parameters provided to affect the training procedure.

You can combine several dataset files into a split file. The format of the split file is:

{
	"label": "name of split",
	"train": [
		"/path/to/dataset1.json",
		"/path/to/dataset2.json",
		...
	],
	"test": [
		//dataset paths here
	],
	"eval": [
		//dataset paths here
	]
}

To use a split file, add the --split_file parameter.

ocr4all-pixel-classifier compute-image-normalizations / ocrd_compute_normalizations

Calculate image normalizations, i.e. scaling factors based on average line height.

Required arguments:

  • --input_dir: location of images
  • --output_dir: target location of norm files

Optional arguments:

  • --average_all: Average height over all images
  • --inverse

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages