Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block splitter #4136

Merged
merged 44 commits into from
Oct 24, 2024
Merged

Block splitter #4136

merged 44 commits into from
Oct 24, 2024

Commits on Oct 23, 2024

  1. XP: add a pre-splitter

    instead of ingesting only full blocks, make an analysis of data, and infer where to split.
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    a5bce4a View commit details
    Browse the repository at this point in the history
  2. fixed strict C90 semantic

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    9e52789 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    586ca96 View commit details
    Browse the repository at this point in the history
  4. use ZSTD_memset()

    for better portability on Linux kernel
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    e2d7d08 View commit details
    Browse the repository at this point in the history
  5. minor C++-ism

    though I really wonder if this is a property worth maintaining.
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    6021b66 View commit details
    Browse the repository at this point in the history
  6. more ZSTD_memset() to apply

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    fa147cb View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    83a3402 View commit details
    Browse the repository at this point in the history
  8. fixed RLE detection test

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    f83ed08 View commit details
    Browse the repository at this point in the history
  9. fixed kernel build

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    8b3887f View commit details
    Browse the repository at this point in the history
  10. fixed single-library build

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    dd38c67 View commit details
    Browse the repository at this point in the history
  11. only split full blocks

    short term simplification
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    0d4b520 View commit details
    Browse the repository at this point in the history
  12. fix assert

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    20c3d17 View commit details
    Browse the repository at this point in the history
  13. fixed c90 comment style

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    6dc5212 View commit details
    Browse the repository at this point in the history
  14. fixed zstreamtest

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    80a912d View commit details
    Browse the repository at this point in the history
  15. fixed meson build

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    6939235 View commit details
    Browse the repository at this point in the history
  16. new Makefile target mesonbuild

    for easier local testing
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    cdddcaa View commit details
    Browse the repository at this point in the history
  17. fixed VS2010 solution

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    76ad1d6 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    31d48e9 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    7f015c2 View commit details
    Browse the repository at this point in the history
  20. ZSTD_splitBlock_4k() uses externally provided workspace

    ideally, this workspace would be provided from the ZSTD_CCtx* state
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    73a6653 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    433f459 View commit details
    Browse the repository at this point in the history
  22. fix alignment test

    for non 64-bit systems
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    4685eaf View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    cae8d13 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    4ce91cb View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    dac26ea View commit details
    Browse the repository at this point in the history
  26. minor split optimization

    let's fill the initial stats directly into target fingerprint
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    1c62e71 View commit details
    Browse the repository at this point in the history
  27. added a faster block splitter variant

    that samples 1 in 5 positions.
    
    This variant is fast enough for lazy2 and btlazy2,
    but it's less good in combination with post-splitter at higher levels (>= btopt).
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    a167571 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    7bad787 View commit details
    Browse the repository at this point in the history
  29. ensure lastBlock is correctly determined

    reported by @terrelln
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    5ae34e4 View commit details
    Browse the repository at this point in the history
  30. conservatively estimate over-splitting in presence of incompressible …

    …loss
    
    ensure data can never be expanded by more than 3 bytes per full block.
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    ea85dc7 View commit details
    Browse the repository at this point in the history
  31. renamed: FingerPrint => Fingerprint

    suggested by @terrelln
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    4662f6e View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    1ec5f9f View commit details
    Browse the repository at this point in the history
  33. rewrite penalty update

    suggested by @terrelln
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    16450d0 View commit details
    Browse the repository at this point in the history
  34. rewrote ZSTD_cwksp_initialAllocStart() to be easier to read

    following a discussion with @felixhandte
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    06b7cfa View commit details
    Browse the repository at this point in the history
  35. fixes static state allocation check

    detected by @felixhandte
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    0be334d View commit details
    Browse the repository at this point in the history
  36. updated compression results

    due to integration of `sample5` strategy, leading to better compression ratios on a range of levels
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    d2eeed5 View commit details
    Browse the repository at this point in the history
  37. fixed extraneous return

    strict C90 compliance test
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    18b1e67 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    57239c4 View commit details
    Browse the repository at this point in the history
  39. rewrite fingerprint storage to no longer need 64-bit members

    so that it can be stored using standard alignment requirement (sizeof(void*)).
    
    Distance function still requires 64-bit signed multiplication though,
    so it won't change the issue regarding the bug in ubsan for clang 32-bit on github ci.
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    b68ddce View commit details
    Browse the repository at this point in the history
  40. split all full 128 KB blocks

    this helps make the streaming behavior more consistent,
    since it does no longer depend on having more data presented on the input.
    
    suggested by @terrelln
    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    7d3e5e3 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    c80645a View commit details
    Browse the repository at this point in the history
  42. update regression results

    Cyan4973 committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    bbda1ac View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2024

  1. apply limit conditions for all splitting strategies

    instead of just for blind split.
    
    This is in anticipation of adversarial input,
    that would intentionally target the sampling pattern of the split detector.
    
    Note that, even without this protection, splitting can never expand beyond ZSTD_COMPRESSBOUND(),
    because this upper limit uses a 1KB block size worst case scenario,
    and splitting never creates blocks thath small.
    
    The protection is more to ensure that data is not expanded by more than 3-bytes per 128 KB full block,
    which is a much stricter limit.
    Cyan4973 committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    90095f0 View commit details
    Browse the repository at this point in the history
  2. update regression results

    first block is no longer splitted since adding the @Savings over-split protection
    Cyan4973 committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    70c77d2 View commit details
    Browse the repository at this point in the history