Skip to content

Releases: klauspost/reedsolomon

v1.9.11

22 Jan 11:59
ab26eb4
Compare
Choose a tag to compare
Add WithInversionCache and use pointer methods (#160)

There appears to be writes to value receivers.

Add `WithInversionCache(bool)` to disable cache.

Fixes #159

v1.9.10

20 Dec 20:42
7c86824
Compare
Choose a tag to compare

Update cpuid package (#154)
Faster AVX2 encoding (#153)

v1.9.9

20 May 10:53
7daa20b
Compare
Choose a tag to compare
Generate AVX2 code (#141)

Replaces AVX2 up to 10x8 configurations with specific generated functions.

If code size is a concern `-tags=nogen` can be used.

Biggest speedup when not memory constrained.
```
benchmark                                old MB/s      new MB/s      speedup
BenchmarkEncode_8x5x8M                   5895.75       9648.18       1.64x
BenchmarkEncode_8x5x8M-4                 16773.41      17220.67      1.03x
BenchmarkEncode_8x5x8M-16                18263.12      17176.28      0.94x
BenchmarkEncode_8x6x8M                   5075.89       8548.39       1.68x
BenchmarkEncode_8x6x8M-4                 14559.83      15370.95      1.06x
BenchmarkEncode_8x6x8M-16                16183.37      15291.98      0.94x
BenchmarkEncode_8x7x8M                   4481.18       7015.60       1.57x
BenchmarkEncode_8x7x8M-4                 12835.35      13695.90      1.07x
BenchmarkEncode_8x7x8M-16                14246.94      13737.36      0.96x 
BenchmarkEncode_8x8x05M                  5569.95       7947.70       1.43x
BenchmarkEncode_8x8x05M-4                17334.91      25271.37      1.46x
BenchmarkEncode_8x8x05M-16               29349.42      35043.36      1.19x
BenchmarkEncode_8x8x1M                   4830.58       7891.32       1.63x
BenchmarkEncode_8x8x1M-4                 17531.36      27371.42      1.56x
BenchmarkEncode_8x8x1M-16                29593.98      39241.09      1.33x
BenchmarkEncode_8x8x8M                   3953.66       6584.26       1.67x
BenchmarkEncode_8x8x8M-4                 11527.34      12331.23      1.07x
BenchmarkEncode_8x8x8M-16                12718.89      12173.08      0.96x
BenchmarkEncode_8x8x32M                  3927.51       6195.91       1.58x
BenchmarkEncode_8x8x32M-4                11490.85      11424.39      0.99x
BenchmarkEncode_8x8x32M-16               12506.09      11888.55      0.95x

benchmark                          old MB/s     new MB/s     speedup
BenchmarkParallel_8x8x64K          5490.24      6959.57      1.27x
BenchmarkParallel_8x8x64K-4        21078.94     29557.51     1.40x
BenchmarkParallel_8x8x64K-16       57508.45     73672.54     1.28x
BenchmarkParallel_8x8x1M           4755.49      7667.84      1.61x
BenchmarkParallel_8x8x1M-4         11818.66     12013.49     1.02x
BenchmarkParallel_8x8x1M-16        12923.12     12109.42     0.94x
BenchmarkParallel_8x8x8M           3973.94      6525.85      1.64x
BenchmarkParallel_8x8x8M-4         11725.68     11312.46     0.96x
BenchmarkParallel_8x8x8M-16        12608.20     11484.98     0.91x
BenchmarkParallel_8x3x1M           14139.71     17993.04     1.27x
BenchmarkParallel_8x3x1M-4         21805.97     23053.92     1.06x
BenchmarkParallel_8x3x1M-16        24673.05     23596.71     0.96x
BenchmarkParallel_8x4x1M           10617.88     14474.54     1.36x
BenchmarkParallel_8x4x1M-4         18635.82     18965.65     1.02x
BenchmarkParallel_8x4x1M-16        21518.12     20171.47     0.94x
BenchmarkParallel_8x5x1M           8669.88      11833.96     1.36x
BenchmarkParallel_8x5x1M-4         16321.00     17500.30     1.07x
BenchmarkParallel_8x5x1M-16        17267.16     17191.04     1.00x
```

v1.9.8

14 May 12:30
e8fdfd6
Compare
Choose a tag to compare
Update readme and re-allow s390x failure.

v1.9.7

09 May 09:02
2df03bd
Compare
Choose a tag to compare
  • Reduce stream allocations.
  • Add fast multiply by row 1 without lookups.
  • AVX2: Add 64 bytes per loop processing.
  • Add AVX512 8 data -> 1 parity.

v1.9.6

05 May 09:54
dccac35
Compare
Choose a tag to compare

Fix compilation issue on ARM64 and PPC64LE.

v1.9.5

04 May 07:49
de70cc1
Compare
Choose a tag to compare
  • Made non-assembly up to 40% faster.
  • AVX512 can use multiple goroutines for lower latency + higher individual throughput.
  • AVX512 5-9% faster.
  • All code faster with user defined goroutines and high concurrency. Up to 8x faster due to less cache evictions.
  • CPUID detects AMD CPUs with hyperthreading/multiple threads/core.
  • CPUID detects AMD per CCX L3 cache size.
  • Use L1 cache size to set minimum split size.
  • Tests/benchmarks can disable specific assembly types.

v1.9.4

22 Apr 15:23
17098a4
Compare
Choose a tag to compare
Use stream test options (#118)

v1.9.3

27 Sep 23:36
Compare
Choose a tag to compare
Update travis script

v1.9.2

26 May 11:18
v1.9.2
Compare
Choose a tag to compare
v1.9.2