pyfastmask - Fast image segmentation format

This is a simple format for storing single channel images with low-frequency data (e.g. semantic segmentation masks).

It has a size similar to PNG, but is much faster (up to 20x) to read.

Installation

From PyPI:

The easiest way to install the latest version is by using pip:

pip install pyfastmask

From source:

git clone git@github.com:JIy3AHKO/pyfastmask.git
cd pyfastmask
pip install -e .

Usage

For image reading and writing, use the read and write functions:

import numpy as np
import pyfastmask as pf

img = np.random.randint(0, 256, (100, 100), dtype=np.uint8)

pf.write('mask.pfm', img)
img2 = pf.read('mask.pfm')

np.testing.assert_array_equal(img, img2)

Benchmark

See BENCHMARK.md for more detailed information.

Image	pyfastmask	opencv png	cv2_bmp	qoi
Median Read Time	0.09 ms	1.71 ms	0.35 ms	0.81 ms
Average Size	217.35 KiB	149.36 KiB	1146.64 KiB	498.24 KiB

Format Description

The pyfastmask efficiently compresses and stores image segmentation masks using Run-Length Encoding (RLE) and line-differential encoding. All values are stored with different bit widths, depending on the number of unique symbols and the mask size - it helps to reduce the size of encoded data.

Storage Structure

The format organizes data into three main sections: header, symbol mapping, and line-by-line encoded data.

1. Header

Magic Byte: Format identifier.
Version Byte: Format version.
Symbol Bit Width: Bits for each symbol.
Count Bit Width: Bits for run lengths.
Line Count Bit Width: Bits for the number of runs per line.
Unique Symbols Count: Number of unique symbols.
Mask Height: Mask height in pixels.
Mask Width: Mask width in pixels.

2. Symbol Mapping

Lists unique symbols in the mask, each encoded with 8 bits.

Semantic segmentation masks usually have a small number of unique symbols, so we can use a small number of bits to encode each symbol.

3. Line-by-Line Encoding

Encodes mask data line by line:

First Line: Encoded with standard RLE.
Subsequent Lines: Encoded with sparse RLE on the difference between the current and previous lines.

First line is represented as: (Number of runs), (Symbol, Run Length), (Symbol, Run Length), ...

Subsequent lines are represented as: (Number of runs), (Offset, Symbol, Run Length), (Offset, Symbol, Run Length), ...

Where:

Number of runs: Number of runs in the line.
Symbol: Symbol index from the symbol mapping.
Run Length: Number of pixels with the same symbol.
Offset: Number of pixels to skip from the previous line.

Encoding Process

Encode Lines:
- First Line: Standard RLE.
- Subsequent Lines: Sparse RLE.
Estimate Bit Widths: Calculate the number of bits required to store each value.
Write Header and Symbol Mapping.
Pack Data: Combine all encoded data into a byte stream.

Decoding Process

Read Header and Symbol Mapping.
Decode first line with standard RLE.
On subsequent lines:
- copy the previous line
- apply sparse RLE to the symbols which are differ from the previous line

Testing

To run tests, use the following command:

python -m unittest discover tests/

Contributing

Contributions are welcome! If you want to contribute, please create an issue or a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
benchmark		benchmark
pyfastmask		pyfastmask
src		src
tests		tests
.gitignore		.gitignore
BENCHMARK.md		BENCHMARK.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyfastmask - Fast image segmentation format

Installation

From PyPI:

From source:

Usage

Benchmark

Format Description

Storage Structure

1. Header

2. Symbol Mapping

3. Line-by-Line Encoding

Encoding Process

Decoding Process

Testing

Contributing

About

Releases 1

Packages

Languages

License

JIy3AHKO/pyfastmask

Folders and files

Latest commit

History

Repository files navigation

pyfastmask - Fast image segmentation format

Installation

From PyPI:

From source:

Usage

Benchmark

Format Description

Storage Structure

1. Header

2. Symbol Mapping

3. Line-by-Line Encoding

Encoding Process

Decoding Process

Testing

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages