Python dependencies are specified in requirements.txt
/ setup.py
.
You must install the package via pip with either ocr4all_pixel_classifier[tf_cpu]
to
use CPU version of tensorflow or ocr4all_pixel_classifier[tf_gpu]
to use GPU (CUDA)
version of tensorflow. For the latter, your system should be set up with CUDA 9
and CuDNN 7.
To run a model on some input images, use ocr4all-pixel-classifier predict
:
ocr4all-pixel-classifier predict --load PATH_TO_MODEL \
--output OUTPUT_PATH \
--binary PATH_TO_BINARY_IMAGES \
--images PATH_TO_SOURCE_IMAGES \
--norm PATH_TO_NORMALIZATIONS
(ocr4all-pixel-classifier
is an alias for ocr4all-pixel-classifier predict
)
This will create three folders at the output path:
color
: the classification as color image, with pixel color corresponding to the class for that pixelinverted
: inverted binary image with classification of foreground pixels only (i.e. background is black, foreground is white or class color)overlay
: classification image layered transparently over the original image
For training, you first have to create dataset files. A dataset file is a JSON file containing three arrays, for train, test and evaluation data (also called train/validation/test in other publications). The JSON file uses the following format:
{
"train": [
//datasets here
],
"test": [
//datasets here
],
"eval": [
//datasets here
]
}
A dataset describes a single input image and consists of several paths: the original image, a binarized version and the mask (pixel color corresponds to class). Furthermore, the line height of the page in pixels must be specified:
{
"binary_path": "/path/to/image/binary/filename.bin.png",
"image_path": "/path/to/image/color/filename.jpg",
"mask_path": "/path/to/image/mask/filename_MASK.png",
"line_height_px": 18
}
The generation of dataset files can be automated using ocr4all-pixel-classifier create-dataset-file
. Refer to the command's --help
output for further
information.
To start the training:
ocr4all-pixel-classifier train \
--train DATASET_FILE.json --test DATASET_FILE.json --eval DATASET_FILE.json \
--output MODEL_TARGET_PATH \
--n_iter 5000
The parameters --train
, --test
and --eval
may be followed by any number of
dataset files or patterns (shell globbing).
Refer to ocr4all-pixel-classifier train --help
for further parameters provided to
affect the training procedure.
You can combine several dataset files into a split file. The format of the split file is:
{
"label": "name of split",
"train": [
"/path/to/dataset1.json",
"/path/to/dataset2.json",
...
],
"test": [
//dataset paths here
],
"eval": [
//dataset paths here
]
}
To use a split file, add the --split_file
parameter.
Calculate image normalizations, i.e. scaling factors based on average line height.
Required arguments:
--input_dir
: location of images--output_dir
: target location of norm files
Optional arguments:
--average_all
: Average height over all images--inverse