Skip to content

Zstandard v1.5.1

Compare
Choose a tag to compare
@Cyan4973 Cyan4973 released this 21 Dec 00:42
· 1619 commits to dev since this release
791626d

Notice : it has been brought to our attention that the v1.5.1 library might be built with an executable stack on non-x64 architectures, which could end up being flagged as problematic by some systems with thorough security settings which disallow executable stack. We are currently reviewing the issue. Be aware of it if you build libzstd for non-x64 architecture.

Zstandard v1.5.1 is a maintenance release, bringing a good number of small refinements to the project. It also offers a welcome crop of performance improvements, as detailed below.

Performance Improvements

Speed improvements for fast compression (levels 1–4)

PRs #2749, #2774, and #2921 refactor single-segment compression for ZSTD_fast and ZSTD_dfast, which back compression levels 1 through 4 (as well as the negative compression levels). Speedups in the ~3-5% range are observed. In addition, the compression ratio of ZSTD_dfast (levels 3 and 4) is slightly improved.

Rebalanced middle compression levels

v1.5.0 introduced major speed improvements for mid-level compression (from 5 to 12), while preserving roughly similar compression ratio. As a consequence, the speed scale became tilted towards faster speed. Unfortunately, the difference between successive levels was no longer regular, and there is a large performance gap just after the impacted range, between levels 12 and 13.

v1.5.1 tries to rebalance parameters so that compression levels can be roughly associated to their former speed budget. Consequently, v1.5.1 mid compression levels feature speeds closer to former v1.4.9 (though still sensibly faster) and receive in exchange an improved compression ratio, as shown in below graph.

comparing v1.4.9 vs v1.5.0 vs 1.5.1on x64 (i7-9700k)

comparing v1.4.9 vs v1.5.0 vs 1.5.1 on arm64 (snapdragon 855)

Note that, since middle levels only experience a rebalancing, save some special cases, no significant performance differences between versions v1.5.0 and v1.5.1 should be expected: levels merely occupy different positions on the same curve. The situation is a bit different for fast levels (1-4), for which v1.5.1 delivers a small but consistent performance benefit on all platforms, as described in previous paragraph.

Huffman Improvements

Our Huffman code was significantly revamped in this release. Both encoding and decoding speed were improved. Additionally, encoding speed for small inputs was improved even further. Speed is measured on the Silesia corpus by compressing with level 1 and extracting the literals left over after compression. Then compressing and decompressing the literals from each block. Measurements are done on an Intel i9-9900K @ 3.6 GHz.

Compiler Scenario v1.5.0 Speed v1.5.1 Speed Delta
gcc-11 Literal compression - 128KB block 748 MB/s 927 MB/s +23.9%
clang-13 Literal compression - 128KB block 810 MB/s 927 MB/s +14.4%
gcc-11 Literal compression - 4KB block 223 MB/s 321 MB/s +44.0%
clang-13 Literal compression - 4KB block 224 MB/s 310 MB/s +38.2%
gcc-11 Literal decompression - 128KB block 1164 MB/s 1500 MB/s +28.8%
clang-13 Literal decompression - 128KB block 1006 MB/s 1504 MB/s +49.5%

Overall impact on (de)compression speed depends on the compressibility of the data. Compression speed improves from 1-4%, and decompression speed improves from 5-15%.

PR #2722 implements the Huffman decoder in assembly for x86-64 with BMI2 enabled. We detect BMI2 support at runtime, so this speedup applies to all x86-64 builds running on CPUs that support BMI2. This improves Huffman decoding speed by about 40%, depending on the scenario. PR #2733 improves Huffman encoding speed by 10% for clang and 20% for gcc. PR #2732 drastically speeds up the HUF_sort() function, which speeds up Huffman tree building for compression. This is a significant speed boost for small inputs, measuring in at a 40% improvement for 4K inputs.

Binary Size and Build Speed

zstd binary size grew significantly in v1.5.0 due to the new code added for middle compression level speed optimizations. In this release we recover the binary size, and in the process also significantly speed up builds, especially with sanitizers enabled.

Measured on x86-64 compiled with -O3 we measure libzstd.a size. We regained 161 KB of binary size on gcc, and 293 KB of binary size on clang. Note that these binary sizes are listed for the whole library, optimized for speed over size. The decoder only, with size saving options enabled, and compiled with -Os or -Oz can be much smaller.

Version gcc-11 size clang-13 size
v1.5.1 1177 KB 1167 KB
v1.5.0 1338 KB 1460 KB
v1.4.9 1137 KB 1151 KB

Change log

Featured user-visible changes

  • perf: rebalanced compression levels, to better match intended speed/level curve, by @senhuang42 and @Cyan4973
  • perf: faster huffman decoder, using x64 assembly, by @terrelln
  • perf: slightly faster high speed modes (strategies fast & dfast), by @felixhandte
  • perf: smaller binary size and faster compilation times, by @terrelln and @nolange
  • perf: new row64 mode, used notably at highest lazy2 levels 11-12, by @senhuang42
  • perf: faster mid-level compression speed in presence of highly repetitive patterns, by @senhuang42
  • perf: minor compression ratio improvements for small data at high levels, by @Cyan4973
  • perf: reduced stack usage (mostly useful for Linux Kernel), by @terrelln
  • perf: faster compression speed on incompressible data, by @bindhvo
  • perf: on-demand reduced ZSTD_DCtx state size, using build macro ZSTD_DECODER_INTERNAL_BUFFER, at a small cost of performance, by @bindhvo
  • build: allows hiding static symbols in the dynamic library, using build macro, by @skitt
  • build: support for m68k (Motorola 68000's), by @Cyan4973
  • build: improved AIX support, by @Helflym
  • build: improved meson unofficial build, by @eli-schwartz
  • cli : fix : forward mtime to output file, by @felixhandte
  • cli : custom memory limit when training dictionary (#2925), by @embg
  • cli : report advanced parameters information when compressing in very verbose mode (-vv), by @Svetlitski-FB
  • cli : advanced commands in the form --long-param= can accept negative value arguments, by @binhdvo

PR full list

New Contributors

Full Changelog: v1.5.0...v1.5.1