Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compression speed on small blocks #4165

Merged
merged 10 commits into from
Oct 11, 2024
Merged

Improve compression speed on small blocks #4165

merged 10 commits into from
Oct 11, 2024

Commits on Sep 20, 2024

  1. Optimize compression by avoiding unpredictable branches

    Avoid unpredictable branch. Use conditional move to generate the address
    that is guaranteed to be safe and compare unconditionally.
    Instead of
    
    if (idx < limit && x[idx] == val ) // mispredicted idx < limit branch
    
    Do
    
    addr = cmov(safe,x+idx)
    if (*addr == val && idx < limit) // almost always false so well predicted
    
    Using microbenchmarks from https://github.com/google/fleetbench,
    I get about ~10% speed-up:
    
    name                                                                                          old cpu/op   new cpu/op    delta
    BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:15                                     1.46ns ± 3%   1.31ns ± 7%   -9.88%  (p=0.000 n=35+38)
    BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:16                                     1.41ns ± 3%   1.28ns ± 3%   -9.56%  (p=0.000 n=36+39)
    BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:15                                     1.61ns ± 1%   1.43ns ± 3%  -10.70%  (p=0.000 n=30+39)
    BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:16                                     1.54ns ± 2%   1.39ns ± 3%   -9.21%  (p=0.000 n=37+39)
    BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:15                                     1.82ns ± 2%   1.61ns ± 3%  -11.31%  (p=0.000 n=37+40)
    BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:16                                     1.73ns ± 3%   1.56ns ± 3%   -9.50%  (p=0.000 n=38+39)
    BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:15                                     2.12ns ± 2%   1.79ns ± 3%  -15.55%  (p=0.000 n=34+39)
    BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:16                                     1.99ns ± 3%   1.72ns ± 3%  -13.70%  (p=0.000 n=38+38)
    BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:15                                      3.22ns ± 3%   2.94ns ± 3%   -8.67%  (p=0.000 n=38+40)
    BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:16                                      3.19ns ± 4%   2.86ns ± 4%  -10.55%  (p=0.000 n=40+38)
    BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:15                                      2.60ns ± 3%   2.22ns ± 3%  -14.53%  (p=0.000 n=40+39)
    BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:16                                      2.46ns ± 3%   2.13ns ± 2%  -13.67%  (p=0.000 n=39+36)
    BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:15                                      2.69ns ± 3%   2.46ns ± 3%   -8.63%  (p=0.000 n=37+39)
    BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:16                                      2.63ns ± 3%   2.36ns ± 3%  -10.47%  (p=0.000 n=40+40)
    BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:15                                      3.20ns ± 2%   2.95ns ± 3%   -7.94%  (p=0.000 n=35+40)
    BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:16                                      3.20ns ± 4%   2.87ns ± 4%  -10.33%  (p=0.000 n=40+40)
    
    I've also measured the impact on internal workloads and saw similar
    ~10% improvement in performance, measured by cpu usage/byte of data.
    TocarIP committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    e8fce38 View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2024

  1. minor refactor zstd_fast

    make hot variables more local
    Cyan4973 committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    1e7fa24 View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2024

  1. refactor search into an inline function

    for easier swapping with a parameter
    Cyan4973 committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    2cc600b View commit details
    Browse the repository at this point in the history
  2. made search strategy switchable

    between cmov and branch
    and use a simple heuristic based on wlog to select between them.
    
    note: performance is not good on clang (yet)
    Cyan4973 committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    186b132 View commit details
    Browse the repository at this point in the history
  3. introduce memory barrier to force test order

    suggested by @terrelln
    Cyan4973 committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    197c258 View commit details
    Browse the repository at this point in the history
  4. store dummy bytes within ZSTD_match4Found_cmov()

    feels more logical, better contained
    Cyan4973 committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    741b860 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    d45aee4 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2024

  1. Configuration menu
    Copy the full SHA
    fa1fcb0 View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2024

  1. fixed parameter ordering in dfast

    noticed by @terrelln
    Cyan4973 committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    83de003 View commit details
    Browse the repository at this point in the history
  2. rename variable name

    findMatch -> matchFound
    since it's a test, as opposed to an active search operation.
    suggested by @terrelln
    Cyan4973 committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    8e5823b View commit details
    Browse the repository at this point in the history