Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate pinned buffer for vectorized code #601

Merged
merged 13 commits into from
Nov 21, 2024
Merged

Conversation

bitfaster
Copy link
Owner

@bitfaster bitfaster commented May 29, 2024

Current AVX2 vectorized code doesn't have much of an advantage on .NET8 and .NET9. We can gain some speed by using the pinned object heap introduced in .NET5 and eliminating the fixed statement. With a fixed address, we can also do a trick to align to 64 bytes, such that each increment/frequency call is guaranteed to be on a single cache line.

Use of fixed results in a fixed local variable in IL, the runtime overhead comes from the JITted code. Explanation here.

This has little effect for smaller sizes, but is noticeable when the sketch is larger. Data captured from current generation Azure v5 VMs on both Intel and AMD CPUs.

AMD (Zen3 - Milan)

BenchmarkDotNet v0.14.0, Windows 10 (10.0.20348.2849) (Hyper-V)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  [Host]   : .NET Framework 4.8 (4.8.4762.0), X64 RyuJIT VectorSize=256
  .NET 6.0 : .NET 6.0.36 (6.0.3624.51421), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
  .NET 9.0 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart

BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Intel (Sunny Cove - Ice Lake)

BenchmarkDotNet v0.14.0, Windows 10 (10.0.20348.2849) (Hyper-V)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  [Host]   : .NET Framework 4.8 (4.8.4762.0), X64 RyuJIT VectorSize=256
  .NET 6.0 : .NET 6.0.36 (6.0.3624.51421), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 9.0 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart

BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Alternative

It is possible to allocate aligned native memory, but this would mean that the Sketch/LFU are disposable.

https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/vectorization-guidelines.md#enforcing-memory-alignment

void* _pointer = NativeMemory.AlignedAlloc(byteCount: Size * sizeof(int), alignment: 32);

BitFaster.Caching/Lfu/CmSketchCore.cs Dismissed Show dismissed Hide dismissed
@bitfaster bitfaster changed the title Allocate aligned buffer for vectorized code Allocate pinned buffer for vectorized code Jun 1, 2024
@coveralls
Copy link

coveralls commented Jun 1, 2024

Coverage Status

coverage: 99.221% (+0.003%) from 99.218%
when pulling a7b2a70 on users/alexpeck/align
into 25ea2bd on main.

@bitfaster
Copy link
Owner Author

bitfaster commented Jun 1, 2024

The improvement on a newer AMD CPU is much less, almost noise.

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4412/22H2/2022Update)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
  [Host]   : .NET Framework 4.8.1 (4.8.9241.0), X64 RyuJIT VectorSize=256
  .NET 6.0 : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Inc Baseline

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart

Tabular
Method Runtime Size Mean Error StdDev Ratio Allocated
IncFlat .NET 6.0 32768 14.651 ns 0.0153 ns 0.0136 ns 1.00 -
IncBlockAvx .NET 6.0 32768 10.291 ns 0.0067 ns 0.0059 ns 0.70 -
IncFlat .NET 8.0 32768 7.597 ns 0.0047 ns 0.0042 ns 1.00 -
IncBlockAvx .NET 8.0 32768 9.656 ns 0.0046 ns 0.0041 ns 1.27 -
IncFlat .NET 6.0 524288 21.848 ns 0.0982 ns 0.0918 ns 1.00 -
IncBlockAvx .NET 6.0 524288 17.109 ns 0.0310 ns 0.0290 ns 0.78 -
IncFlat .NET 8.0 524288 13.142 ns 0.0089 ns 0.0084 ns 1.00 -
IncBlockAvx .NET 8.0 524288 15.699 ns 0.0267 ns 0.0236 ns 1.19 -
IncFlat .NET 6.0 8388608 94.629 ns 0.5928 ns 0.5545 ns 1.00 -
IncBlockAvx .NET 6.0 8388608 50.040 ns 0.8664 ns 0.7235 ns 0.53 -
IncFlat .NET 8.0 8388608 50.009 ns 0.1871 ns 0.1563 ns 1.00 -
IncBlockAvx .NET 8.0 8388608 44.393 ns 0.6317 ns 0.5909 ns 0.89 -
IncFlat .NET 6.0 134217728 113.630 ns 0.8264 ns 0.7326 ns 1.00 -
IncBlockAvx .NET 6.0 134217728 57.872 ns 0.1868 ns 0.1656 ns 0.51 -
IncFlat .NET 8.0 134217728 59.166 ns 0.2901 ns 0.2713 ns 1.00 -
IncBlockAvx .NET 8.0 134217728 55.850 ns 0.1741 ns 0.1629 ns 0.94 -

In Pinned

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart

Tabular
Method Runtime Size Mean Error StdDev Ratio Allocated
IncFlat .NET 6.0 32768 13.492 ns 0.0247 ns 0.0219 ns 1.00 -
IncBlockAvx .NET 6.0 32768 9.755 ns 0.0033 ns 0.0029 ns 0.72 -
IncFlat .NET 8.0 32768 7.736 ns 0.0026 ns 0.0020 ns 1.00 -
IncBlockAvx .NET 8.0 32768 9.218 ns 0.0088 ns 0.0078 ns 1.19 -
IncFlat .NET 6.0 524288 21.858 ns 0.0245 ns 0.0217 ns 1.00 -
IncBlockAvx .NET 6.0 524288 16.511 ns 0.0810 ns 0.0718 ns 0.76 -
IncFlat .NET 8.0 524288 13.199 ns 0.0107 ns 0.0090 ns 1.00 -
IncBlockAvx .NET 8.0 524288 15.414 ns 0.0331 ns 0.0310 ns 1.17 -
IncFlat .NET 6.0 8388608 95.009 ns 0.3450 ns 0.3228 ns 1.00 -
IncBlockAvx .NET 6.0 8388608 45.086 ns 0.7894 ns 0.7384 ns 0.47 -
IncFlat .NET 8.0 8388608 50.017 ns 0.3173 ns 0.2968 ns 1.00 -
IncBlockAvx .NET 8.0 8388608 43.059 ns 0.8532 ns 0.9129 ns 0.86 -
IncFlat .NET 6.0 134217728 114.944 ns 2.0130 ns 1.8830 ns 1.00 -
IncBlockAvx .NET 6.0 134217728 58.349 ns 1.0282 ns 0.9618 ns 0.51 -
IncFlat .NET 8.0 134217728 60.193 ns 0.2612 ns 0.2316 ns 1.00 -
IncBlockAvx .NET 8.0 134217728 54.887 ns 0.3250 ns 0.3040 ns 0.91 -

Freq baseline

BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Tabular
Method Runtime Size Mean Error StdDev Ratio Allocated
FrequencyFlat .NET 6.0 32768 21.92 ns 0.057 ns 0.054 ns 1.00 -
FrequencyBlockAvx .NET 6.0 32768 15.56 ns 0.013 ns 0.011 ns 0.71 -
FrequencyFlat .NET 8.0 32768 11.69 ns 0.011 ns 0.010 ns 1.00 -
FrequencyBlockAvx .NET 8.0 32768 13.63 ns 0.009 ns 0.008 ns 1.17 -
FrequencyFlat .NET 6.0 524288 29.68 ns 0.025 ns 0.021 ns 1.00 -
FrequencyBlockAvx .NET 6.0 524288 24.00 ns 0.016 ns 0.015 ns 0.81 -
FrequencyFlat .NET 8.0 524288 19.65 ns 0.010 ns 0.009 ns 1.00 -
FrequencyBlockAvx .NET 8.0 524288 19.52 ns 0.009 ns 0.008 ns 0.99 -
FrequencyFlat .NET 6.0 8388608 107.02 ns 0.466 ns 0.436 ns 1.00 -
FrequencyBlockAvx .NET 6.0 8388608 77.00 ns 1.092 ns 0.968 ns 0.72 -
FrequencyFlat .NET 8.0 8388608 93.68 ns 0.581 ns 0.515 ns 1.00 -
FrequencyBlockAvx .NET 8.0 8388608 73.08 ns 1.449 ns 2.756 ns 0.78 -
FrequencyFlat .NET 6.0 134217728 125.13 ns 0.505 ns 0.473 ns 1.00 -
FrequencyBlockAvx .NET 6.0 134217728 101.10 ns 0.656 ns 0.614 ns 0.81 -
FrequencyFlat .NET 8.0 134217728 112.72 ns 0.371 ns 0.347 ns 1.00 -
FrequencyBlockAvx .NET 8.0 134217728 96.57 ns 0.927 ns 0.867 ns 0.86 -

Freq pinned

BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Tabular
Method Runtime Size Mean Error StdDev Ratio Allocated
FrequencyFlat .NET 6.0 32768 22.16 ns 0.019 ns 0.018 ns 1.00 -
FrequencyBlockAvx .NET 6.0 32768 13.99 ns 0.011 ns 0.009 ns 0.63 -
FrequencyFlat .NET 8.0 32768 11.83 ns 0.006 ns 0.006 ns 1.00 -
FrequencyBlockAvx .NET 8.0 32768 13.24 ns 0.007 ns 0.007 ns 1.12 -
FrequencyFlat .NET 6.0 524288 29.58 ns 0.043 ns 0.038 ns 1.00 -
FrequencyBlockAvx .NET 6.0 524288 20.39 ns 0.035 ns 0.033 ns 0.69 -
FrequencyFlat .NET 8.0 524288 19.50 ns 0.010 ns 0.009 ns 1.00 -
FrequencyBlockAvx .NET 8.0 524288 20.20 ns 0.011 ns 0.009 ns 1.04 -
FrequencyFlat .NET 6.0 8388608 105.48 ns 0.376 ns 0.352 ns 1.00 -
FrequencyBlockAvx .NET 6.0 8388608 72.93 ns 1.420 ns 1.636 ns 0.69 -
FrequencyFlat .NET 8.0 8388608 91.64 ns 0.509 ns 0.451 ns 1.00 -
FrequencyBlockAvx .NET 8.0 8388608 71.15 ns 1.233 ns 1.153 ns 0.78 -
FrequencyFlat .NET 6.0 134217728 124.44 ns 0.623 ns 0.583 ns 1.00 -
FrequencyBlockAvx .NET 6.0 134217728 96.96 ns 1.083 ns 0.960 ns 0.78 -
FrequencyFlat .NET 8.0 134217728 111.34 ns 0.774 ns 0.724 ns 1.00 -
FrequencyBlockAvx .NET 8.0 134217728 94.06 ns 0.862 ns 0.806 ns 0.84 -

BitFaster.Caching.Benchmarks/Lfu/CmSketchNoPin.cs Dismissed Show dismissed Hide dismissed
BitFaster.Caching.Benchmarks/Lfu/CmSketchNoPin.cs Dismissed Show dismissed Hide dismissed
BitFaster.Caching.Benchmarks/Lfu/CmSketchNoPin.cs Dismissed Show dismissed Hide dismissed
BitFaster.Caching.Benchmarks/Lfu/CmSketchNoPin.cs Dismissed Show dismissed Hide dismissed
@bitfaster
Copy link
Owner Author

bitfaster commented Nov 21, 2024

Pinned block is faster except for size=524288.

BenchmarkDotNet v0.14.0, Windows 10 (10.0.20348.2849) (Hyper-V)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  [Host]   : .NET Framework 4.8 (4.8.4762.0), X64 RyuJIT VectorSize=256
  .NET 6.0 : .NET 6.0.36 (6.0.3624.51421), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
  .NET 9.0 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart

Method Runtime Size Mean Error StdDev Ratio Allocated
IncFlat .NET 6.0 32768 21.87 ns 0.015 ns 0.013 ns 1.00 -
IncBlock .NET 6.0 32768 35.68 ns 0.306 ns 0.287 ns 1.63 -
IncBlockAvxNotPinned .NET 6.0 32768 13.80 ns 0.070 ns 0.066 ns 0.63 -
IncBlockAvxPinned .NET 6.0 32768 13.05 ns 0.013 ns 0.012 ns 0.60 -
IncFlat .NET 8.0 32768 10.94 ns 0.005 ns 0.005 ns 1.00 -
IncBlock .NET 8.0 32768 16.79 ns 0.248 ns 0.232 ns 1.54 -
IncBlockAvxNotPinned .NET 8.0 32768 12.34 ns 0.028 ns 0.026 ns 1.13 -
IncBlockAvxPinned .NET 8.0 32768 11.94 ns 0.020 ns 0.017 ns 1.09 -
IncFlat .NET 9.0 32768 10.94 ns 0.003 ns 0.003 ns 1.00 -
IncBlock .NET 9.0 32768 16.73 ns 0.026 ns 0.024 ns 1.53 -
IncBlockAvxNotPinned .NET 9.0 32768 12.30 ns 0.016 ns 0.015 ns 1.12 -
IncBlockAvxPinned .NET 9.0 32768 11.70 ns 0.027 ns 0.025 ns 1.07 -
IncFlat .NET 6.0 524288 28.65 ns 0.036 ns 0.030 ns 1.00 -
IncBlock .NET 6.0 524288 29.38 ns 0.248 ns 0.207 ns 1.03 -
IncBlockAvxNotPinned .NET 6.0 524288 23.26 ns 0.106 ns 0.099 ns 0.81 -
IncBlockAvxPinned .NET 6.0 524288 27.67 ns 0.549 ns 1.183 ns 0.97 -
IncFlat .NET 8.0 524288 16.85 ns 0.062 ns 0.058 ns 1.00 -
IncBlock .NET 8.0 524288 23.45 ns 0.385 ns 0.360 ns 1.39 -
IncBlockAvxNotPinned .NET 8.0 524288 21.92 ns 0.156 ns 0.146 ns 1.30 -
IncBlockAvxPinned .NET 8.0 524288 20.46 ns 0.116 ns 0.109 ns 1.21 -
IncFlat .NET 9.0 524288 16.80 ns 0.049 ns 0.046 ns 1.00 -
IncBlock .NET 9.0 524288 23.32 ns 0.147 ns 0.137 ns 1.39 -
IncBlockAvxNotPinned .NET 9.0 524288 21.21 ns 0.032 ns 0.027 ns 1.26 -
IncBlockAvxPinned .NET 9.0 524288 20.86 ns 0.115 ns 0.107 ns 1.24 -
IncFlat .NET 6.0 8388608 113.91 ns 1.257 ns 1.050 ns 1.00 -
IncBlock .NET 6.0 8388608 60.35 ns 1.185 ns 1.317 ns 0.53 -
IncBlockAvxNotPinned .NET 6.0 8388608 63.96 ns 1.275 ns 1.829 ns 0.56 -
IncBlockAvxPinned .NET 6.0 8388608 67.14 ns 0.887 ns 0.830 ns 0.59 -
IncFlat .NET 8.0 8388608 64.47 ns 0.452 ns 0.401 ns 1.00 -
IncBlock .NET 8.0 8388608 55.46 ns 0.896 ns 1.313 ns 0.86 -
IncBlockAvxNotPinned .NET 8.0 8388608 57.89 ns 0.564 ns 0.500 ns 0.90 -
IncBlockAvxPinned .NET 8.0 8388608 46.41 ns 0.756 ns 0.707 ns 0.72 -
IncFlat .NET 9.0 8388608 64.47 ns 0.423 ns 0.375 ns 1.00 -
IncBlock .NET 9.0 8388608 55.24 ns 1.099 ns 1.541 ns 0.86 -
IncBlockAvxNotPinned .NET 9.0 8388608 57.58 ns 0.905 ns 0.802 ns 0.89 -
IncBlockAvxPinned .NET 9.0 8388608 46.22 ns 0.599 ns 0.561 ns 0.72 -
IncFlat .NET 6.0 134217728 139.36 ns 1.174 ns 0.981 ns 1.00 -
IncBlock .NET 6.0 134217728 77.90 ns 0.532 ns 0.444 ns 0.56 -
IncBlockAvxNotPinned .NET 6.0 134217728 76.08 ns 0.845 ns 0.790 ns 0.55 -
IncBlockAvxPinned .NET 6.0 134217728 77.90 ns 0.509 ns 0.451 ns 0.56 -
IncFlat .NET 8.0 134217728 76.72 ns 0.685 ns 0.641 ns 1.00 -
IncBlock .NET 8.0 134217728 72.26 ns 0.544 ns 0.509 ns 0.94 -
IncBlockAvxNotPinned .NET 8.0 134217728 73.89 ns 1.196 ns 1.119 ns 0.96 -
IncBlockAvxPinned .NET 8.0 134217728 55.57 ns 0.579 ns 0.542 ns 0.72 -
IncFlat .NET 9.0 134217728 75.78 ns 0.619 ns 0.579 ns 1.00 -
IncBlock .NET 9.0 134217728 71.59 ns 0.647 ns 0.606 ns 0.94 -
IncBlockAvxNotPinned .NET 9.0 134217728 72.41 ns 1.264 ns 1.183 ns 0.96 -
IncBlockAvxPinned .NET 9.0 134217728 55.75 ns 0.815 ns 0.637 ns 0.74 -

@bitfaster
Copy link
Owner Author

BenchmarkDotNet v0.14.0, Windows 10 (10.0.20348.2849) (Hyper-V)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  [Host]   : .NET Framework 4.8 (4.8.4762.0), X64 RyuJIT VectorSize=256
  .NET 6.0 : .NET 6.0.36 (6.0.3624.51421), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
  .NET 9.0 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2

BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Method Runtime Size Mean Error StdDev Ratio Allocated
FrequencyFlat .NET 6.0 32768 30.91 ns 0.114 ns 0.095 ns 1.00 -
FrequencyBlock .NET 6.0 32768 22.08 ns 0.023 ns 0.022 ns 0.71 -
FrequencyBlockAvxNotPinned .NET 6.0 32768 24.07 ns 0.017 ns 0.013 ns 0.78 -
FrequencyBlockAvxPinned .NET 6.0 32768 18.76 ns 0.021 ns 0.019 ns 0.61 -
FrequencyFlat .NET 8.0 32768 16.05 ns 0.010 ns 0.009 ns 1.00 -
FrequencyBlock .NET 8.0 32768 20.91 ns 0.017 ns 0.016 ns 1.30 -
FrequencyBlockAvxNotPinned .NET 8.0 32768 20.96 ns 0.009 ns 0.008 ns 1.31 -
FrequencyBlockAvxPinned .NET 8.0 32768 17.58 ns 0.040 ns 0.038 ns 1.10 -
FrequencyFlat .NET 9.0 32768 15.69 ns 0.011 ns 0.010 ns 1.00 -
FrequencyBlock .NET 9.0 32768 20.74 ns 0.021 ns 0.020 ns 1.32 -
FrequencyBlockAvxNotPinned .NET 9.0 32768 20.40 ns 0.019 ns 0.017 ns 1.30 -
FrequencyBlockAvxPinned .NET 9.0 32768 18.25 ns 0.011 ns 0.009 ns 1.16 -
FrequencyFlat .NET 6.0 524288 40.34 ns 0.045 ns 0.042 ns 1.00 -
FrequencyBlock .NET 6.0 524288 31.63 ns 0.056 ns 0.044 ns 0.78 -
FrequencyBlockAvxNotPinned .NET 6.0 524288 29.59 ns 0.143 ns 0.134 ns 0.73 -
FrequencyBlockAvxPinned .NET 6.0 524288 25.33 ns 0.115 ns 0.102 ns 0.63 -
FrequencyFlat .NET 8.0 524288 23.74 ns 0.059 ns 0.055 ns 1.00 -
FrequencyBlock .NET 8.0 524288 27.55 ns 0.041 ns 0.039 ns 1.16 -
FrequencyBlockAvxNotPinned .NET 8.0 524288 26.47 ns 0.043 ns 0.040 ns 1.12 -
FrequencyBlockAvxPinned .NET 8.0 524288 23.70 ns 0.055 ns 0.049 ns 1.00 -
FrequencyFlat .NET 9.0 524288 23.37 ns 0.027 ns 0.022 ns 1.00 -
FrequencyBlock .NET 9.0 524288 28.05 ns 0.134 ns 0.119 ns 1.20 -
FrequencyBlockAvxNotPinned .NET 9.0 524288 26.23 ns 0.206 ns 0.192 ns 1.12 -
FrequencyBlockAvxPinned .NET 9.0 524288 24.30 ns 0.063 ns 0.059 ns 1.04 -
FrequencyFlat .NET 6.0 8388608 145.96 ns 2.528 ns 2.365 ns 1.00 -
FrequencyBlock .NET 6.0 8388608 109.88 ns 2.097 ns 2.652 ns 0.75 -
FrequencyBlockAvxNotPinned .NET 6.0 8388608 101.45 ns 1.180 ns 1.103 ns 0.70 -
FrequencyBlockAvxPinned .NET 6.0 8388608 68.74 ns 1.323 ns 1.359 ns 0.47 -
FrequencyFlat .NET 8.0 8388608 98.09 ns 0.551 ns 0.515 ns 1.00 -
FrequencyBlock .NET 8.0 8388608 102.25 ns 2.043 ns 5.201 ns 1.04 -
FrequencyBlockAvxNotPinned .NET 8.0 8388608 98.52 ns 1.095 ns 1.025 ns 1.00 -
FrequencyBlockAvxPinned .NET 8.0 8388608 67.13 ns 1.221 ns 1.142 ns 0.68 -
FrequencyFlat .NET 9.0 8388608 89.43 ns 0.652 ns 0.610 ns 1.00 -
FrequencyBlock .NET 9.0 8388608 93.23 ns 1.655 ns 1.382 ns 1.04 -
FrequencyBlockAvxNotPinned .NET 9.0 8388608 97.74 ns 1.281 ns 1.199 ns 1.09 -
FrequencyBlockAvxPinned .NET 9.0 8388608 67.73 ns 1.296 ns 1.149 ns 0.76 -
FrequencyFlat .NET 6.0 134217728 166.68 ns 1.719 ns 1.608 ns 1.00 -
FrequencyBlock .NET 6.0 134217728 142.01 ns 1.509 ns 1.260 ns 0.85 -
FrequencyBlockAvxNotPinned .NET 6.0 134217728 110.62 ns 0.930 ns 0.870 ns 0.66 -
FrequencyBlockAvxPinned .NET 6.0 134217728 81.62 ns 0.442 ns 0.345 ns 0.49 -
FrequencyFlat .NET 8.0 134217728 116.18 ns 1.316 ns 1.231 ns 1.00 -
FrequencyBlock .NET 8.0 134217728 137.24 ns 1.311 ns 1.227 ns 1.18 -
FrequencyBlockAvxNotPinned .NET 8.0 134217728 108.63 ns 0.656 ns 0.614 ns 0.94 -
FrequencyBlockAvxPinned .NET 8.0 134217728 80.61 ns 0.949 ns 0.887 ns 0.69 -
FrequencyFlat .NET 9.0 134217728 105.23 ns 0.589 ns 0.551 ns 1.00 -
FrequencyBlock .NET 9.0 134217728 134.88 ns 0.984 ns 0.920 ns 1.28 -
FrequencyBlockAvxNotPinned .NET 9.0 134217728 109.37 ns 0.926 ns 0.866 ns 1.04 -
FrequencyBlockAvxPinned .NET 9.0 134217728 80.76 ns 0.438 ns 0.409 ns 0.77 -

@bitfaster
Copy link
Owner Author

bitfaster commented Nov 21, 2024

BenchmarkDotNet v0.14.0, Windows 10 (10.0.20348.2849) (Hyper-V)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  [Host]   : .NET Framework 4.8 (4.8.4762.0), X64 RyuJIT VectorSize=256
  .NET 6.0 : .NET 6.0.36 (6.0.3624.51421), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 9.0 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart

Method Runtime Size Mean Error StdDev Ratio Allocated
IncFlat .NET 6.0 32768 21.20 ns 0.119 ns 0.099 ns 1.00 -
IncBlock .NET 6.0 32768 17.13 ns 0.065 ns 0.051 ns 0.81 -
IncBlockAvxNotPinned .NET 6.0 32768 15.95 ns 0.092 ns 0.082 ns 0.75 -
IncBlockAvxPinned .NET 6.0 32768 15.26 ns 0.291 ns 0.311 ns 0.72 -
IncFlat .NET 8.0 32768 11.38 ns 0.143 ns 0.127 ns 1.00 -
IncBlock .NET 8.0 32768 15.85 ns 0.249 ns 0.208 ns 1.39 -
IncBlockAvxNotPinned .NET 8.0 32768 14.60 ns 0.023 ns 0.020 ns 1.28 -
IncBlockAvxPinned .NET 8.0 32768 13.92 ns 0.275 ns 0.467 ns 1.22 -
IncFlat .NET 9.0 32768 11.10 ns 0.022 ns 0.019 ns 1.00 -
IncBlock .NET 9.0 32768 17.58 ns 0.122 ns 0.102 ns 1.58 -
IncBlockAvxNotPinned .NET 9.0 32768 14.12 ns 0.248 ns 0.232 ns 1.27 -
IncBlockAvxPinned .NET 9.0 32768 13.31 ns 0.165 ns 0.146 ns 1.20 -
IncFlat .NET 6.0 524288 27.98 ns 0.191 ns 0.179 ns 1.00 -
IncBlock .NET 6.0 524288 26.57 ns 0.417 ns 0.390 ns 0.95 -
IncBlockAvxNotPinned .NET 6.0 524288 25.19 ns 0.290 ns 0.257 ns 0.90 -
IncBlockAvxPinned .NET 6.0 524288 24.72 ns 0.315 ns 0.263 ns 0.88 -
IncFlat .NET 8.0 524288 18.17 ns 0.361 ns 0.337 ns 1.00 -
IncBlock .NET 8.0 524288 23.76 ns 0.453 ns 0.444 ns 1.31 -
IncBlockAvxNotPinned .NET 8.0 524288 40.41 ns 0.309 ns 0.274 ns 2.22 -
IncBlockAvxPinned .NET 8.0 524288 22.09 ns 0.174 ns 0.163 ns 1.22 -
IncFlat .NET 9.0 524288 18.35 ns 0.346 ns 0.340 ns 1.00 -
IncBlock .NET 9.0 524288 24.03 ns 0.253 ns 0.211 ns 1.31 -
IncBlockAvxNotPinned .NET 9.0 524288 22.57 ns 0.158 ns 0.132 ns 1.23 -
IncBlockAvxPinned .NET 9.0 524288 22.25 ns 0.097 ns 0.090 ns 1.21 -
IncFlat .NET 6.0 8388608 84.76 ns 0.388 ns 0.363 ns 1.00 -
IncBlock .NET 6.0 8388608 67.50 ns 1.041 ns 0.923 ns 0.80 -
IncBlockAvxNotPinned .NET 6.0 8388608 52.75 ns 0.393 ns 0.349 ns 0.62 -
IncBlockAvxPinned .NET 6.0 8388608 51.78 ns 0.333 ns 0.296 ns 0.61 -
IncFlat .NET 8.0 8388608 58.35 ns 0.350 ns 0.292 ns 1.00 -
IncBlock .NET 8.0 8388608 60.92 ns 0.462 ns 0.410 ns 1.04 -
IncBlockAvxNotPinned .NET 8.0 8388608 51.15 ns 0.221 ns 0.184 ns 0.88 -
IncBlockAvxPinned .NET 8.0 8388608 49.63 ns 0.343 ns 0.287 ns 0.85 -
IncFlat .NET 9.0 8388608 58.48 ns 0.576 ns 0.539 ns 1.00 -
IncBlock .NET 9.0 8388608 60.11 ns 0.319 ns 0.266 ns 1.03 -
IncBlockAvxNotPinned .NET 9.0 8388608 49.91 ns 0.247 ns 0.231 ns 0.85 -
IncBlockAvxPinned .NET 9.0 8388608 48.62 ns 0.243 ns 0.227 ns 0.83 -
IncFlat .NET 6.0 134217728 113.34 ns 1.098 ns 1.027 ns 1.00 -
IncBlock .NET 6.0 134217728 88.70 ns 1.109 ns 1.037 ns 0.78 -
IncBlockAvxNotPinned .NET 6.0 134217728 65.24 ns 0.679 ns 0.602 ns 0.58 -
IncBlockAvxPinned .NET 6.0 134217728 66.43 ns 1.301 ns 1.548 ns 0.59 -
IncFlat .NET 8.0 134217728 87.11 ns 1.650 ns 1.621 ns 1.00 -
IncBlock .NET 8.0 134217728 85.18 ns 1.702 ns 3.700 ns 0.98 -
IncBlockAvxNotPinned .NET 8.0 134217728 65.26 ns 1.233 ns 1.211 ns 0.75 -
IncBlockAvxPinned .NET 8.0 134217728 63.13 ns 1.108 ns 0.925 ns 0.73 -
IncFlat .NET 9.0 134217728 85.21 ns 1.011 ns 0.896 ns 1.00 -
IncBlock .NET 9.0 134217728 83.15 ns 1.649 ns 1.462 ns 0.98 -
IncBlockAvxNotPinned .NET 9.0 134217728 63.64 ns 1.194 ns 1.058 ns 0.75 -
IncBlockAvxPinned .NET 9.0 134217728 62.52 ns 0.645 ns 0.603 ns 0.73 -

@bitfaster
Copy link
Owner Author

BenchmarkDotNet v0.14.0, Windows 10 (10.0.20348.2849) (Hyper-V)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  [Host]   : .NET Framework 4.8 (4.8.4762.0), X64 RyuJIT VectorSize=256
  .NET 6.0 : .NET 6.0.36 (6.0.3624.51421), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 9.0 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Method Runtime Size Mean Error StdDev Ratio Allocated
FrequencyFlat .NET 6.0 32768 26.20 ns 0.148 ns 0.131 ns 1.00 -
FrequencyBlock .NET 6.0 32768 22.52 ns 0.171 ns 0.143 ns 0.86 -
FrequencyBlockAvxNotPinned .NET 6.0 32768 27.78 ns 0.072 ns 0.060 ns 1.06 -
FrequencyBlockAvxPinned .NET 6.0 32768 27.67 ns 0.041 ns 0.037 ns 1.06 -
FrequencyFlat .NET 8.0 32768 16.06 ns 0.181 ns 0.151 ns 1.00 -
FrequencyBlock .NET 8.0 32768 18.02 ns 0.050 ns 0.039 ns 1.12 -
FrequencyBlockAvxNotPinned .NET 8.0 32768 25.63 ns 0.055 ns 0.046 ns 1.60 -
FrequencyBlockAvxPinned .NET 8.0 32768 26.10 ns 0.128 ns 0.107 ns 1.62 -
FrequencyFlat .NET 9.0 32768 15.35 ns 0.026 ns 0.020 ns 1.00 -
FrequencyBlock .NET 9.0 32768 18.35 ns 0.031 ns 0.026 ns 1.20 -
FrequencyBlockAvxNotPinned .NET 9.0 32768 25.23 ns 0.078 ns 0.061 ns 1.64 -
FrequencyBlockAvxPinned .NET 9.0 32768 25.04 ns 0.030 ns 0.027 ns 1.63 -
FrequencyFlat .NET 6.0 524288 42.08 ns 0.818 ns 0.765 ns 1.00 -
FrequencyBlock .NET 6.0 524288 31.15 ns 0.384 ns 0.340 ns 0.74 -
FrequencyBlockAvxNotPinned .NET 6.0 524288 32.13 ns 0.612 ns 0.654 ns 0.76 -
FrequencyBlockAvxPinned .NET 6.0 524288 28.84 ns 0.262 ns 0.232 ns 0.69 -
FrequencyFlat .NET 8.0 524288 25.05 ns 0.354 ns 0.331 ns 1.00 -
FrequencyBlock .NET 8.0 524288 24.85 ns 0.307 ns 0.272 ns 0.99 -
FrequencyBlockAvxNotPinned .NET 8.0 524288 30.44 ns 0.462 ns 0.432 ns 1.22 -
FrequencyBlockAvxPinned .NET 8.0 524288 26.29 ns 0.065 ns 0.055 ns 1.05 -
FrequencyFlat .NET 9.0 524288 24.54 ns 0.275 ns 0.257 ns 1.00 -
FrequencyBlock .NET 9.0 524288 24.83 ns 0.367 ns 0.325 ns 1.01 -
FrequencyBlockAvxNotPinned .NET 9.0 524288 28.56 ns 0.560 ns 0.599 ns 1.16 -
FrequencyBlockAvxPinned .NET 9.0 524288 25.22 ns 0.074 ns 0.058 ns 1.03 -
FrequencyFlat .NET 6.0 8388608 145.02 ns 1.532 ns 1.358 ns 1.00 -
FrequencyBlock .NET 6.0 8388608 105.86 ns 0.838 ns 0.699 ns 0.73 -
FrequencyBlockAvxNotPinned .NET 6.0 8388608 86.38 ns 0.475 ns 0.371 ns 0.60 -
FrequencyBlockAvxPinned .NET 6.0 8388608 72.39 ns 0.424 ns 0.396 ns 0.50 -
FrequencyFlat .NET 8.0 8388608 85.24 ns 0.550 ns 0.459 ns 1.00 -
FrequencyBlock .NET 8.0 8388608 69.58 ns 0.324 ns 0.253 ns 0.82 -
FrequencyBlockAvxNotPinned .NET 8.0 8388608 84.32 ns 1.235 ns 1.156 ns 0.99 -
FrequencyBlockAvxPinned .NET 8.0 8388608 68.44 ns 0.324 ns 0.287 ns 0.80 -
FrequencyFlat .NET 9.0 8388608 85.32 ns 0.676 ns 0.528 ns 1.00 -
FrequencyBlock .NET 9.0 8388608 69.69 ns 0.483 ns 0.404 ns 0.82 -
FrequencyBlockAvxNotPinned .NET 9.0 8388608 79.30 ns 0.434 ns 0.362 ns 0.93 -
FrequencyBlockAvxPinned .NET 9.0 8388608 66.19 ns 0.619 ns 0.517 ns 0.78 -
FrequencyFlat .NET 6.0 134217728 195.54 ns 2.236 ns 1.868 ns 1.00 -
FrequencyBlock .NET 6.0 134217728 158.63 ns 1.970 ns 1.843 ns 0.81 -
FrequencyBlockAvxNotPinned .NET 6.0 134217728 115.10 ns 2.215 ns 3.315 ns 0.59 -
FrequencyBlockAvxPinned .NET 6.0 134217728 97.79 ns 1.719 ns 1.608 ns 0.50 -
FrequencyFlat .NET 8.0 134217728 118.06 ns 1.438 ns 1.275 ns 1.00 -
FrequencyBlock .NET 8.0 134217728 94.07 ns 1.413 ns 1.252 ns 0.80 -
FrequencyBlockAvxNotPinned .NET 8.0 134217728 111.46 ns 2.177 ns 2.235 ns 0.94 -
FrequencyBlockAvxPinned .NET 8.0 134217728 91.55 ns 1.210 ns 1.072 ns 0.78 -
FrequencyFlat .NET 9.0 134217728 117.45 ns 1.600 ns 1.497 ns 1.00 -
FrequencyBlock .NET 9.0 134217728 93.63 ns 1.586 ns 1.484 ns 0.80 -
FrequencyBlockAvxNotPinned .NET 9.0 134217728 105.99 ns 1.587 ns 1.407 ns 0.90 -
FrequencyBlockAvxPinned .NET 9.0 134217728 87.39 ns 1.371 ns 1.282 ns 0.74 -

@bitfaster bitfaster marked this pull request as ready for review November 21, 2024 06:51
@bitfaster bitfaster merged commit 5ef26a3 into main Nov 21, 2024
13 checks passed
@bitfaster bitfaster deleted the users/alexpeck/align branch November 21, 2024 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants