-
Notifications
You must be signed in to change notification settings - Fork 322
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
zstd: Shorter and faster asm for decSymbol.newState (#896)
* zstd: Shorter asm for decSymbol.newState The asm needs to compute decSymbol.newState, which is uint16(state >> 16), or, equivalently (except for types), uint32(state) >> 16. This can be accomplished by a MOVL+SHRL, the former of which is elided by avo, so we get a single instruction for both the BMI2 and non-BMI2 cases. Benchmarks show no difference on a new BMI2-supporting machine, but on an older i7, decompression throughput is a tiny bit faster: goos: linux goarch: amd64 pkg: github.com/klauspost/compress/zstd cpu: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz │ old │ shift │ │ B/s │ B/s vs base │ Decoder_DecodeAll/kppkn.gtb.zst-8 441.4Mi ± 2% 450.4Mi ± 0% +2.03% (p=0.000 n=10) Decoder_DecodeAll/geo.protodata.zst-8 1.148Gi ± 1% 1.152Gi ± 0% +0.34% (p=0.009 n=10) Decoder_DecodeAll/plrabn12.txt.zst-8 347.9Mi ± 0% 356.6Mi ± 1% +2.48% (p=0.000 n=10) Decoder_DecodeAll/lcet10.txt.zst-8 417.4Mi ± 0% 427.3Mi ± 0% +2.37% (p=0.000 n=10) Decoder_DecodeAll/asyoulik.txt.zst-8 347.1Mi ± 0% 352.7Mi ± 1% +1.62% (p=0.003 n=10) Decoder_DecodeAll/alice29.txt.zst-8 346.3Mi ± 1% 352.6Mi ± 0% +1.83% (p=0.000 n=10) Decoder_DecodeAll/html_x_4.zst-8 1.440Gi ± 0% 1.445Gi ± 0% +0.29% (p=0.019 n=10) Decoder_DecodeAll/paper-100k.pdf.zst-8 4.191Gi ± 0% 4.210Gi ± 0% +0.45% (p=0.007 n=10) Decoder_DecodeAll/fireworks.jpeg.zst-8 8.891Gi ± 0% 8.849Gi ± 0% -0.47% (p=0.000 n=10) Decoder_DecodeAll/urls.10K.zst-8 589.6Mi ± 0% 600.2Mi ± 0% +1.80% (p=0.001 n=10) Decoder_DecodeAll/html.zst-8 926.1Mi ± 1% 937.9Mi ± 0% +1.27% (p=0.000 n=10) Decoder_DecodeAll/comp-data.bin.zst-8 389.6Mi ± 0% 395.1Mi ± 0% +1.40% (p=0.000 n=10) geomean 832.6Mi 843.3Mi +1.28% * zstd: Remove unused parameter in asm generator
- Loading branch information
Showing
2 changed files
with
68 additions
and
107 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters