Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate remaining benchmarks to Criterion #1490

Merged
merged 19 commits into from
Sep 6, 2024
Merged

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Aug 29, 2024

  • Added a CHANGELOG.md entry

Summary

Translate everything still using the old test harness to Criterion.

Motivation

Remove dependency on the (mostly deprecated) standard test harness. Enable usage of Criterion command-line arguments (which are otherwise intercepted by the Rust test harness).

Closes #1039.

Details

Mostly these are straightforward translations or simplifications (Criterion can benchmark ~1ns functions well enough so we don't need 1000 repetitions inside our benchmark functions).

Some I have added to benchmark groups. Some I have used shorter warmup/measurement durations since the defaults are quite conservative.

Many of the new results match up very well with those from the old test framework. Others don't; in particular the generators benchmarks (especially byte benches) are notably slower, but vaguely the same relative performance.

Old generators benchmark results

test gen_bytes_chacha12      ... bench:     217,351.58 ns/iter (+/- 1,433.82) = 4711 MB/s
test gen_bytes_chacha20      ... bench:     357,241.45 ns/iter (+/- 3,830.55) = 2866 MB/s
test gen_bytes_chacha8       ... bench:     147,663.57 ns/iter (+/- 872.74) = 6934 MB/s
test gen_bytes_os            ... bench:   2,039,790.30 ns/iter (+/- 9,026.71) = 502 MB/s
test gen_bytes_pcg32         ... bench:     297,029.40 ns/iter (+/- 2,519.55) = 3447 MB/s
test gen_bytes_pcg64         ... bench:     168,586.07 ns/iter (+/- 575.08) = 6074 MB/s
test gen_bytes_pcg64dxsm     ... bench:     157,215.12 ns/iter (+/- 439.56) = 6513 MB/s
test gen_bytes_pcg64mcg      ... bench:     121,489.14 ns/iter (+/- 109.97) = 8428 MB/s
test gen_bytes_small         ... bench:      90,309.88 ns/iter (+/- 878.92) = 11338 MB/s
test gen_bytes_std           ... bench:     217,169.73 ns/iter (+/- 1,440.60) = 4715 MB/s
test gen_bytes_step          ... bench:      20,989.58 ns/iter (+/- 156.67) = 48787 MB/s
test gen_bytes_thread        ... bench:     226,905.33 ns/iter (+/- 1,608.46) = 4512 MB/s
test gen_u32_chacha12        ... bench:       1,103.66 ns/iter (+/- 8.22) = 3626 MB/s
test gen_u32_chacha20        ... bench:       1,778.53 ns/iter (+/- 13.89) = 2249 MB/s
test gen_u32_chacha8         ... bench:         840.26 ns/iter (+/- 9.14) = 4761 MB/s
test gen_u32_os              ... bench:     287,570.50 ns/iter (+/- 3,060.80) = 13 MB/s
test gen_u32_pcg32           ... bench:       1,037.70 ns/iter (+/- 4.48) = 3857 MB/s
test gen_u32_pcg64           ... bench:       1,326.92 ns/iter (+/- 8.72) = 3016 MB/s
test gen_u32_pcg64dxsm       ... bench:       1,349.86 ns/iter (+/- 6.69) = 2965 MB/s
test gen_u32_pcg64mcg        ... bench:         934.49 ns/iter (+/- 5.20) = 4282 MB/s
test gen_u32_small           ... bench:         752.43 ns/iter (+/- 6.05) = 5319 MB/s
test gen_u32_std             ... bench:       1,104.51 ns/iter (+/- 17.14) = 3623 MB/s
test gen_u32_step            ... bench:           0.41 ns/iter (+/- 0.00) = 4000000 MB/s
test gen_u32_thread          ... bench:       1,244.38 ns/iter (+/- 23.10) = 3215 MB/s
test gen_u64_chacha12        ... bench:       1,847.26 ns/iter (+/- 4.20) = 4331 MB/s
test gen_u64_chacha20        ... bench:       2,938.12 ns/iter (+/- 11.34) = 2722 MB/s
test gen_u64_chacha8         ... bench:       1,312.82 ns/iter (+/- 9.21) = 6097 MB/s
test gen_u64_os              ... bench:     287,363.80 ns/iter (+/- 1,950.13) = 27 MB/s
test gen_u64_pcg32           ... bench:       1,698.28 ns/iter (+/- 6.02) = 4711 MB/s
test gen_u64_pcg64           ... bench:       1,325.13 ns/iter (+/- 6.38) = 6037 MB/s
test gen_u64_pcg64dxsm       ... bench:       1,345.90 ns/iter (+/- 5.35) = 5947 MB/s
test gen_u64_pcg64mcg        ... bench:         931.76 ns/iter (+/- 1.62) = 8592 MB/s
test gen_u64_small           ... bench:         673.24 ns/iter (+/- 4.31) = 11887 MB/s
test gen_u64_std             ... bench:       1,850.87 ns/iter (+/- 3.75) = 4324 MB/s
test gen_u64_step            ... bench:           0.41 ns/iter (+/- 0.00) = 8000000 MB/s
test gen_u64_thread          ... bench:       2,042.33 ns/iter (+/- 12.88) = 3917 MB/s
test init_chacha             ... bench:          17.61 ns/iter (+/- 0.04)
test init_pcg32              ... bench:           4.15 ns/iter (+/- 0.03)
test init_pcg64              ... bench:           7.67 ns/iter (+/- 0.01)
test init_pcg64dxsm          ... bench:           7.46 ns/iter (+/- 0.03)
test init_pcg64mcg           ... bench:           4.03 ns/iter (+/- 0.07)
test reseeding_chacha20_16k  ... bench:   6,100,544.10 ns/iter (+/- 16,085.01) = 2750 MB/s
test reseeding_chacha20_1M   ... bench:   5,779,843.60 ns/iter (+/- 18,430.08) = 2902 MB/s
test reseeding_chacha20_256k ... bench:   5,797,746.90 ns/iter (+/- 11,876.55) = 2893 MB/s
test reseeding_chacha20_32k  ... bench:   5,940,838.90 ns/iter (+/- 15,901.71) = 2824 MB/s
test reseeding_chacha20_4k   ... bench:   7,049,439.30 ns/iter (+/- 18,340.65) = 2379 MB/s
test reseeding_chacha20_64k  ... bench:   5,859,552.60 ns/iter (+/- 6,867.71) = 2863 MB/s

New generators benchmark results

gen_bytes/step          time:   [96.118 ns 96.319 ns 96.550 ns]
                        thrpt:  [9.8775 GiB/s 9.9012 GiB/s 9.9219 GiB/s]
                 change:
                        time:   [-0.2388% -0.0153% +0.2193%] (p = 0.90 > 0.05)
                        thrpt:  [-0.2188% +0.0153% +0.2394%]
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  8 (8.00%) high mild
  8 (8.00%) high severe
gen_bytes/pcg32         time:   [356.01 ns 356.91 ns 357.88 ns]
                        thrpt:  [2.6648 GiB/s 2.6720 GiB/s 2.6788 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
gen_bytes/pcg64         time:   [251.88 ns 252.09 ns 252.37 ns]
                        thrpt:  [3.7789 GiB/s 3.7830 GiB/s 3.7863 GiB/s]
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
gen_bytes/pcg64mcg      time:   [219.68 ns 220.16 ns 220.70 ns]
                        thrpt:  [4.3211 GiB/s 4.3318 GiB/s 4.3413 GiB/s]
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) high mild
gen_bytes/pcg64dxsm     time:   [253.80 ns 254.00 ns 254.21 ns]
                        thrpt:  [3.7515 GiB/s 3.7546 GiB/s 3.7575 GiB/s]
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
gen_bytes/chacha8       time:   [246.81 ns 246.99 ns 247.20 ns]
                        thrpt:  [3.8579 GiB/s 3.8611 GiB/s 3.8640 GiB/s]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
gen_bytes/chacha12      time:   [320.27 ns 320.54 ns 320.82 ns]
                        thrpt:  [2.9726 GiB/s 2.9752 GiB/s 2.9777 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
gen_bytes/chacha20      time:   [451.98 ns 452.27 ns 452.57 ns]
                        thrpt:  [2.1072 GiB/s 2.1087 GiB/s 2.1100 GiB/s]
gen_bytes/std           time:   [305.45 ns 306.22 ns 307.03 ns]
                        thrpt:  [3.1062 GiB/s 3.1144 GiB/s 3.1222 GiB/s]
gen_bytes/small         time:   [170.36 ns 170.61 ns 170.89 ns]
                        thrpt:  [5.5807 GiB/s 5.5898 GiB/s 5.5981 GiB/s]
gen_bytes/os            time:   [2.1675 µs 2.1714 µs 2.1757 µs]
                        thrpt:  [448.84 MiB/s 449.75 MiB/s 450.55 MiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
gen_bytes/thread        time:   [331.20 ns 331.67 ns 332.10 ns]
                        thrpt:  [2.8716 GiB/s 2.8754 GiB/s 2.8794 GiB/s]

gen_u32/step time: [208.71 ps 208.84 ps 208.97 ps]
thrpt: [17.827 GiB/s 17.838 GiB/s 17.849 GiB/s]
change:
time: [-1.8456% -1.7380% -1.5912%] (p = 0.00 < 0.05)
thrpt: [+1.6170% +1.7688% +1.8803%]
Performance has improved.
Found 30 outliers among 1000 measurements (3.00%)
19 (1.90%) high mild
11 (1.10%) high severe
gen_u32/pcg32 time: [1.0358 ns 1.0360 ns 1.0362 ns]
thrpt: [3.5950 GiB/s 3.5959 GiB/s 3.5966 GiB/s]
change:
time: [+4.0822% +4.1792% +4.2713%] (p = 0.00 < 0.05)
thrpt: [-4.0963% -4.0115% -3.9220%]
Performance has regressed.
Found 93 outliers among 1000 measurements (9.30%)
1 (0.10%) low severe
19 (1.90%) high mild
73 (7.30%) high severe
gen_u32/pcg64 time: [1.2821 ns 1.2868 ns 1.2916 ns]
thrpt: [2.8843 GiB/s 2.8949 GiB/s 2.9056 GiB/s]
change:
time: [+1.1689% +1.5157% +1.8449%] (p = 0.00 < 0.05)
thrpt: [-1.8115% -1.4931% -1.1554%]
Performance has regressed.
gen_u32/pcg64mcg time: [952.60 ps 953.13 ps 953.71 ps]
thrpt: [3.9061 GiB/s 3.9085 GiB/s 3.9107 GiB/s]
Found 8 outliers among 1000 measurements (0.80%)
1 (0.10%) high mild
7 (0.70%) high severe
gen_u32/pcg64dxsm time: [1.4181 ns 1.4188 ns 1.4196 ns]
thrpt: [2.6243 GiB/s 2.6257 GiB/s 2.6270 GiB/s]
Found 163 outliers among 1000 measurements (16.30%)
86 (8.60%) high mild
77 (7.70%) high severe
gen_u32/chacha8 time: [965.13 ps 965.69 ps 966.35 ps]
thrpt: [3.8550 GiB/s 3.8577 GiB/s 3.8599 GiB/s]
Found 193 outliers among 1000 measurements (19.30%)
31 (3.10%) low severe
121 (12.10%) low mild
26 (2.60%) high mild
15 (1.50%) high severe
gen_u32/chacha12 time: [1.2628 ns 1.2637 ns 1.2646 ns]
thrpt: [2.9457 GiB/s 2.9480 GiB/s 2.9501 GiB/s]
Found 7 outliers among 1000 measurements (0.70%)
3 (0.30%) high mild
4 (0.40%) high severe
gen_u32/chacha20 time: [1.7690 ns 1.7701 ns 1.7711 ns]
thrpt: [2.1033 GiB/s 2.1046 GiB/s 2.1059 GiB/s]
Found 12 outliers among 1000 measurements (1.20%)
7 (0.70%) high mild
5 (0.50%) high severe
gen_u32/std time: [1.2071 ns 1.2075 ns 1.2079 ns]
thrpt: [3.0842 GiB/s 3.0852 GiB/s 3.0860 GiB/s]
Found 14 outliers among 1000 measurements (1.40%)
7 (0.70%) high mild
7 (0.70%) high severe
gen_u32/small time: [676.73 ps 678.39 ps 680.13 ps]
thrpt: [5.4773 GiB/s 5.4914 GiB/s 5.5048 GiB/s]
gen_u32/os time: [293.12 ns 293.35 ns 293.58 ns]
thrpt: [12.994 MiB/s 13.004 MiB/s 13.014 MiB/s]
Found 3 outliers among 1000 measurements (0.30%)
3 (0.30%) high mild
gen_u32/thread time: [1.2436 ns 1.2449 ns 1.2462 ns]
thrpt: [2.9893 GiB/s 2.9925 GiB/s 2.9955 GiB/s]
Found 87 outliers among 1000 measurements (8.70%)
53 (5.30%) high mild
34 (3.40%) high severe

gen_u64/step time: [211.35 ps 211.44 ps 211.55 ps]
thrpt: [35.218 GiB/s 35.237 GiB/s 35.252 GiB/s]
Found 135 outliers among 1000 measurements (13.50%)
20 (2.00%) low mild
64 (6.40%) high mild
51 (5.10%) high severe
gen_u64/pcg32 time: [2.0725 ns 2.0731 ns 2.0740 ns]
thrpt: [3.5924 GiB/s 3.5939 GiB/s 3.5951 GiB/s]
Found 146 outliers among 1000 measurements (14.60%)
89 (8.90%) high mild
57 (5.70%) high severe
gen_u64/pcg64 time: [1.2850 ns 1.2887 ns 1.2924 ns]
thrpt: [5.7649 GiB/s 5.7814 GiB/s 5.7980 GiB/s]
Found 1 outliers among 1000 measurements (0.10%)
1 (0.10%) high mild
gen_u64/pcg64mcg time: [947.60 ps 947.97 ps 948.38 ps]
thrpt: [7.8561 GiB/s 7.8595 GiB/s 7.8626 GiB/s]
Found 94 outliers among 1000 measurements (9.40%)
77 (7.70%) high mild
17 (1.70%) high severe
gen_u64/pcg64dxsm time: [1.2180 ns 1.2185 ns 1.2193 ns]
thrpt: [6.1107 GiB/s 6.1143 GiB/s 6.1171 GiB/s]
Found 77 outliers among 1000 measurements (7.70%)
32 (3.20%) high mild
45 (4.50%) high severe
gen_u64/chacha8 time: [1.4459 ns 1.4471 ns 1.4485 ns]
thrpt: [5.1438 GiB/s 5.1486 GiB/s 5.1528 GiB/s]
Found 113 outliers among 1000 measurements (11.30%)
1 (0.10%) low severe
65 (6.50%) low mild
29 (2.90%) high mild
18 (1.80%) high severe
gen_u64/chacha12 time: [1.9792 ns 1.9809 ns 1.9827 ns]
thrpt: [3.7578 GiB/s 3.7612 GiB/s 3.7644 GiB/s]
Found 22 outliers among 1000 measurements (2.20%)
12 (1.20%) high mild
10 (1.00%) high severe
gen_u64/chacha20 time: [3.0280 ns 3.0291 ns 3.0303 ns]
thrpt: [2.4587 GiB/s 2.4596 GiB/s 2.4605 GiB/s]
Found 60 outliers among 1000 measurements (6.00%)
27 (2.70%) low mild
15 (1.50%) high mild
18 (1.80%) high severe
gen_u64/std time: [1.9742 ns 1.9752 ns 1.9763 ns]
thrpt: [3.7699 GiB/s 3.7722 GiB/s 3.7740 GiB/s]
Found 75 outliers among 1000 measurements (7.50%)
6 (0.60%) low mild
40 (4.00%) high mild
29 (2.90%) high severe
gen_u64/small time: [654.74 ps 655.21 ps 655.69 ps]
thrpt: [11.363 GiB/s 11.371 GiB/s 11.379 GiB/s]
Found 5 outliers among 1000 measurements (0.50%)
4 (0.40%) high mild
1 (0.10%) high severe
gen_u64/os time: [290.50 ns 290.72 ns 290.94 ns]
thrpt: [26.223 MiB/s 26.243 MiB/s 26.263 MiB/s]
Found 22 outliers among 1000 measurements (2.20%)
1 (0.10%) low mild
12 (1.20%) high mild
9 (0.90%) high severe
gen_u64/thread time: [2.0391 ns 2.0402 ns 2.0412 ns]
thrpt: [3.6500 GiB/s 3.6519 GiB/s 3.6538 GiB/s]
Found 13 outliers among 1000 measurements (1.30%)
8 (0.80%) high mild
5 (0.50%) high severe

init_gen/pcg32 time: [8.5311 ns 8.5418 ns 8.5523 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
init_gen/pcg64 time: [16.844 ns 16.874 ns 16.904 ns]
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
init_gen/pcg64mcg time: [7.5557 ns 7.5669 ns 7.5803 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
init_gen/pcg64dxsm time: [16.452 ns 16.472 ns 16.495 ns]
Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high mild
init_gen/chacha8 time: [26.466 ns 26.549 ns 26.639 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
init_gen/chacha12 time: [26.301 ns 26.419 ns 26.547 ns]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
init_gen/chacha20 time: [26.389 ns 26.471 ns 26.564 ns]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
init_gen/std time: [26.426 ns 26.467 ns 26.508 ns]

reseeding_bytes/chacha20_4k
time: [440.00 µs 440.37 µs 440.81 µs]
thrpt: [2.2154 GiB/s 2.2176 GiB/s 2.2195 GiB/s]
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
reseeding_bytes/chacha20_16k
time: [381.24 µs 381.56 µs 381.93 µs]
thrpt: [2.5569 GiB/s 2.5594 GiB/s 2.5616 GiB/s]
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
reseeding_bytes/chacha20_32k
time: [371.49 µs 371.60 µs 371.72 µs]
thrpt: [2.6271 GiB/s 2.6280 GiB/s 2.6288 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
reseeding_bytes/chacha20_64k
time: [367.69 µs 367.94 µs 368.21 µs]
thrpt: [2.6522 GiB/s 2.6541 GiB/s 2.6559 GiB/s]
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) high mild
3 (3.00%) high severe
reseeding_bytes/chacha20_256k
time: [364.63 µs 365.37 µs 366.10 µs]
thrpt: [2.6675 GiB/s 2.6728 GiB/s 2.6783 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
reseeding_bytes/chacha20_1024k
time: [365.37 µs 365.79 µs 366.25 µs]
thrpt: [2.6664 GiB/s 2.6697 GiB/s 2.6728 GiB/s]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

@dhardy dhardy requested review from vks and TheIronBorn August 29, 2024 12:08
@dhardy dhardy marked this pull request as ready for review August 29, 2024 14:18
benches/src/generators.rs Outdated Show resolved Hide resolved
Copy link
Member

@MichaelOwenDyer MichaelOwenDyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I know enough about benchmarking to be able to critically review this, but I took a look over it and it all seems sensible and correct to my untrained eyes.

@dhardy dhardy merged commit 9e030aa into rust-random:master Sep 6, 2024
15 checks passed
@dhardy dhardy deleted the benches branch September 6, 2024 08:09
dhardy added a commit to dhardy/rand that referenced this pull request Sep 10, 2024
Looks like a mistake in rust-random#1490.
dhardy added a commit to dhardy/rand that referenced this pull request Sep 24, 2024
Looks like a mistake in rust-random#1490.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate benchmarks to criterion-cycles-per-byte
3 participants