Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Nat64 for PRNG #69

Merged
merged 8 commits into from
Jul 24, 2023
Merged

use Nat64 for PRNG #69

merged 8 commits into from
Jul 24, 2023

Conversation

chenyan-dfinity
Copy link
Contributor

@chenyan-dfinity chenyan-dfinity commented Jul 22, 2023

Reduce overhead in PRNG

  • Don't converting Nat32 to Nat to avoid allocation
  • Remove debug_show
  • Switch to Nat64 to avoid wrapping
  • Reuse rand for removal
  • Scale collection benchmarks to 1m

@github-actions
Copy link

github-actions bot commented Jul 22, 2023

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 133_828 ($\textcolor{green}{-12.29\%}$) 344_919_719 ($\textcolor{green}{-71.15\%}$) 3_099_888 ($\textcolor{green}{-65.94\%}$) 283_600 ($\textcolor{green}{-48.02\%}$) 273_010_658 ($\textcolor{green}{-25.32\%}$) 304_884 ($\textcolor{green}{-41.47\%}$)
triemap 135_316 ($\textcolor{green}{-13.49\%}$) 476_664_789 ($\textcolor{green}{-64.40\%}$) 3_713_736 ($\textcolor{green}{-61.78\%}$) 187_068 ($\textcolor{green}{-59.31\%}$) 447_552 ($\textcolor{green}{-62.49\%}$) 431_270 ($\textcolor{green}{-37.18\%}$)
rbtree 136_114 ($\textcolor{green}{-11.19\%}$) 247_816_636 ($\textcolor{green}{-77.78\%}$) 2_899_996 ($\textcolor{green}{-67.42\%}$) 72_364 ($\textcolor{green}{-79.60\%}$) 214_369 ($\textcolor{green}{-77.77\%}$) 227_128 ($\textcolor{green}{-54.13\%}$)
splay 131_868 ($\textcolor{green}{-13.64\%}$) 454_525_330 ($\textcolor{green}{-65.66\%}$) 2_699_932 ($\textcolor{green}{-68.97\%}$) 437_199 ($\textcolor{green}{-39.20\%}$) 463_233 ($\textcolor{green}{-61.85\%}$) 639_212 ($\textcolor{green}{-10.87\%}$)
btree 176_459 ($\textcolor{green}{-2.09\%}$) 351_887_997 ($\textcolor{green}{-71.22\%}$) 1_554_008 ($\textcolor{green}{-79.43\%}$) 219_323 ($\textcolor{green}{-56.39\%}$) 337_458 ($\textcolor{green}{-69.05\%}$) 368_138 ($\textcolor{green}{-31.88\%}$)
zhenya_hashmap 141_855 ($\textcolor{green}{-4.46\%}$) 131_227_951 ($\textcolor{green}{-86.74\%}$) 3_299_636 ($\textcolor{green}{-64.53\%}$) 62_085 ($\textcolor{green}{-81.46\%}$) 76_830 ($\textcolor{green}{-90.61\%}$) 89_171 ($\textcolor{green}{-73.40\%}$)
btreemap_rs 413_478 ($\textcolor{green}{-4.63\%}$) 70_011_959 ($\textcolor{green}{-37.04\%}$) 1_245_184 ($\textcolor{green}{-24.00\%}$) 57_133 ($\textcolor{green}{-1.18\%}$) 86_370 ($\textcolor{green}{-34.05\%}$) 79_811 ($\textcolor{red}{31.03\%}$)
hashmap_rs 406_096 ($\textcolor{green}{-4.55\%}$) 15_433_930 ($\textcolor{green}{-67.51\%}$) 1_835_008 17_154 ($\textcolor{green}{-2.99\%}$) 21_926 ($\textcolor{green}{-59.99\%}$) 19_973 ($\textcolor{red}{9.16\%}$)

Priority queue

binary_size heapify 50k max mem pop_min 50 put 50
heap 129_512 ($\textcolor{green}{-7.46\%}$) 84_138_307 ($\textcolor{green}{-77.23\%}$) 1_499_928 ($\textcolor{red}{7.14\%}$) 336_956 ($\textcolor{red}{0.77\%}$) 115_668 ($\textcolor{green}{-70.90\%}$)
heap_rs 403_925 ($\textcolor{green}{-0.09\%}$) 6_215_095 ($\textcolor{red}{19.00\%}$) 1_114_112 ($\textcolor{red}{36.00\%}$) 45_601 ($\textcolor{green}{-0.60\%}$) 18_370 ($\textcolor{green}{-1.46\%}$)

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 133_828 ($\textcolor{green}{-12.29\%}$) 68_512_644 ($\textcolor{green}{-71.33\%}$) 620_540 ($\textcolor{green}{-65.92\%}$) 281_892 ($\textcolor{green}{-48.18\%}$) 54_360_484 ($\textcolor{green}{-26.07\%}$) 303_285 ($\textcolor{green}{-41.52\%}$)
hashmap_rs 406_096 ($\textcolor{green}{-4.55\%}$) 3_429_927 ($\textcolor{green}{-65.00\%}$) 950_272 16_522 ($\textcolor{green}{-2.99\%}$) 20_832 ($\textcolor{green}{-61.52\%}$) 19_973 ($\textcolor{red}{15.86\%}$)
imrc_hashmap_rs 413_588 ($\textcolor{green}{-4.68\%}$) 18_594_807 ($\textcolor{green}{-27.29\%}$) 1_507_328 ($\textcolor{green}{-4.17\%}$) 29_484 ($\textcolor{red}{3.70\%}$) 107_838 ($\textcolor{green}{-27.81\%}$) 80_154 ($\textcolor{red}{120.91\%}$)
movm_rs 1_712_108 ($\textcolor{green}{-0.01\%}$) 1_089_653_746 2_654_208 2_497_600 6_977_084 5_505_446
movm_dynamic_rs 1_830_967 ($\textcolor{green}{-0.00\%}$) 509_925_551 2_129_920 2_045_671 2_754_013 2_043_229

Statistics

  • binary_size: -6.36% [-8.65%, -4.08%]
  • max_mem: -41.73% [-62.95%, -20.51%]
  • cycles: -39.22% [-48.56%, -29.88%]

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_469 ($\textcolor{green}{-0.06\%}$) 16_228 ($\textcolor{green}{-0.26\%}$) 12_658 ($\textcolor{red}{0.02\%}$) 14_128 ($\textcolor{red}{0.16\%}$)
Rust 704_886 471_865 86_470 104_617 115_765

DIP721 NFT

Note
Same as main branch, skipping.

Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: -0.04% [-0.24%, 0.17%]

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 23_699 474 ($\textcolor{green}{-40.08\%}$)

Timer

Note
Same as main branch, skipping.

Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: no change

Garbage Collection

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 101_238_593 ($\textcolor{green}{-59.81\%}$) 5_940_184 ($\textcolor{green}{-61.77\%}$) 50 50 50
copying 101_238_543 ($\textcolor{green}{-59.81\%}$) 5_940_184 ($\textcolor{green}{-61.77\%}$) 101_236_505 ($\textcolor{green}{-59.81\%}$) 101_303_512 ($\textcolor{green}{-59.81\%}$) 101_238_661 ($\textcolor{green}{-59.84\%}$)
compacting 165_791_998 ($\textcolor{green}{-56.98\%}$) 5_940_184 ($\textcolor{green}{-61.77\%}$) 129_325_699 ($\textcolor{green}{-56.28\%}$) 153_305_309 ($\textcolor{green}{-56.78\%}$) 155_891_703 ($\textcolor{green}{-54.03\%}$)
generational 248_396_502 ($\textcolor{green}{-57.91\%}$) 5_940_448 ($\textcolor{green}{-61.77\%}$) 47_649 ($\textcolor{green}{-6.94\%}$) 850_111 ($\textcolor{green}{-19.14\%}$) 762_565 ($\textcolor{red}{28.28\%}$)
incremental 253_634_373 ($\textcolor{red}{31.65\%}$) 4_624 ($\textcolor{green}{-0.09\%}$) 17_416_620 ($\textcolor{green}{-96.65\%}$) 290_834_822 ($\textcolor{red}{124.03\%}$) 27_313_007 ($\textcolor{green}{-91.52\%}$)

Actor class

Note
Same as main branch, skipping.

Statistics

  • binary_size: no change
  • max_mem: -49.44% [-75.74%, -23.13%]
  • cycles: -35.96% [-58.94%, -12.98%]

Overall Statistics

  • binary_size: -6.36% [-8.65%, -4.08%]
  • max_mem: -44.14% [-59.45%, -28.82%]
  • cycles: -36.23% [-44.72%, -27.75%]

@github-actions
Copy link

github-actions bot commented Jul 22, 2023

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 133_828 6_960_077_358 61_987_708 287_469 5_515_887_135 308_972
triemap 135_316 11_431_084_368 74_215_992 222_768 547_650 538_998
rbtree 136_114 5_979_229_531 57_995_880 88_900 268_568 278_334
splay 131_868 11_568_250_397 53_995_816 551_921 581_659 810_215
btree 176_459 8_224_241_532 31_103_868 277_537 384_166 429_036
zhenya_hashmap 141_855 2_756_209_728 65_987_456 68_392 83_100 93_270
btreemap_rs 413_478 1_649_709_879 13_762_560 66_814 112_263 81_263
imrc_hashmap_rs 413_588 2_385_702_121 122_454_016 32_846 162_715 98_494
hashmap_rs 406_096 392_593_368 36_536_320 16_498 20_863 19_973

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50
heap 127_748 4_684_517_789 29_995_812 511_494 186_460
heap_rs 403_925 123_102_482 9_109_504 53_320 18_138

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_517 16_312 12_696 14_153
Rust 704_886 471_865 86_525 104_617 115_765

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_882 12_181 22_319 4_710
Rust 766_710 125_034 324_482 77_116

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 23_699 474

Timer

binary_size setTimer cancelTimer
Motoko 125_168 15_208 1_679
Rust 434_848 43_540 7_683

Motoko Specific Benchmarks

Measure various features only available in Motoko.

  • Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.

    • default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
    • copying. Compile with --force-gc --copying-gc.
    • compacting. Compile with --force-gc --compacting-gc.
    • generational. Compile with --force-gc --generational-gc.
    • incremental. Compile with --force-gc --incremental-gc.
  • Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

generate 800k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_012_258_524 59_396_752 50 50 50
copying 1_012_258_474 59_396_752 1_012_236_033 1_012_303_043 1_012_240_270
compacting 1_675_009_912 59_396_752 1_292_955_487 1_532_273_628 1_558_502_973
generational 2_517_025_054 59_397_016 977_578_942 1_052_786 967_410
incremental 32_320_741 4_624 290_257_785 292_951_006 292_977_552

Actor class

binary size put new bucket put existing bucket get
Map 254_076 638_613 4_449 4_909

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 139_886 126_827 14_641 8_451 10_530 3_662
Rust 472_135 519_916 51_591 34_661 74_169 41_615

@chenyan-dfinity chenyan-dfinity changed the title use Nat32 for PRNG use Nat64 for PRNG Jul 23, 2023
* scale collection to 1M

* 0.8M

* fix and back to 1M

* fix

* disable heap

* fix

* add back heap

* don't use heapify

* uninstall canister to avoid CI OOM
@github-actions
Copy link

github-actions bot commented Jul 24, 2023

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Warning
Skip _out/collections/README.md, due to the number of tables mismatches from main branch.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_517 ($\textcolor{red}{0.06\%}$) 16_312 ($\textcolor{red}{0.26\%}$) 12_696 ($\textcolor{red}{0.32\%}$) 14_153 ($\textcolor{red}{0.34\%}$)
Rust 704_886 471_865 86_525 ($\textcolor{red}{0.06\%}$) 104_617 115_765

DIP721 NFT

Note
Same as main branch, skipping.

Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: 0.21% [0.08%, 0.34%]

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 23_699 474 ($\textcolor{green}{-40.08\%}$)

Timer

Note
Same as main branch, skipping.

Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: no change

Warning
Skip table 0 ## Garbage Collection from _out/motoko/README.md, due to table shape mismatches from main branch.

Actor class

Note
Same as main branch, skipping.

Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: no change

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 139_886 126_827 14_641 ($\textcolor{red}{0.06\%}$) 8_451 10_530 3_662
Rust 472_135 519_916 51_591 34_661 74_169 41_615

Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: 0.06%

Overall Statistics

  • binary_size: no change
  • max_mem: no change
  • cycles: 0.18% [0.07%, 0.30%]

chenyan-dfinity added a commit that referenced this pull request Jul 24, 2023
* use Nat32 for PRNG

* fix

* fix

* avoid debug_show overhead

* use Nat64

* reuse rand for remove

* scale collection to 1M (#70)

* scale collection to 1M

* 0.8M

* fix and back to 1M

* fix

* disable heap

* fix

* add back heap

* don't use heapify

* uninstall canister to avoid CI OOM
@chenyan-dfinity chenyan-dfinity merged commit c37c1df into main Jul 24, 2023
1 check passed
@chenyan-dfinity chenyan-dfinity deleted the fix-random branch July 24, 2023 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants