Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump moc to 0.9.7 #58

Merged
merged 10 commits into from
Jul 19, 2023
Merged

bump moc to 0.9.7 #58

merged 10 commits into from
Jul 19, 2023

Conversation

ggreif
Copy link
Contributor

@ggreif ggreif commented Jul 18, 2023

  • bump moc to 0.9.7
  • add incremental GC
  • bump Rust CDK to 0.10.0
  • enable LTO and O3 for Rust canisters

@github-actions
Copy link

github-actions bot commented Jul 18, 2023

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 152_580 1_195_632_150 ($\textcolor{green}{-43.07\%}$) 9_102_052 545_645 ($\textcolor{green}{-51.01\%}$) 365_569_669 ($\textcolor{green}{-40.31\%}$) 520_876 ($\textcolor{green}{-50.57\%}$)
triemap 156_424 1_338_995_779 ($\textcolor{green}{-32.97\%}$) 9_715_900 459_710 ($\textcolor{green}{-40.47\%}$) 1_193_026 ($\textcolor{green}{-34.57\%}$) 686_569 ($\textcolor{green}{-31.56\%}$)
rbtree 153_258 ($\textcolor{green}{-0.08\%}$) 1_115_533_975 ($\textcolor{green}{-37.12\%}$) 8_902_160 354_721 ($\textcolor{green}{-47.17\%}$) 964_237 ($\textcolor{green}{-39.53\%}$) 495_133 ($\textcolor{green}{-39.30\%}$)
splay 152_693 ($\textcolor{green}{-0.10\%}$) 1_323_550_652 ($\textcolor{green}{-33.39\%}$) 8_702_096 719_103 ($\textcolor{green}{-30.67\%}$) 1_214_198 ($\textcolor{green}{-34.37\%}$) 717_146 ($\textcolor{green}{-30.99\%}$)
btree 180_227 ($\textcolor{green}{-0.52\%}$) 1_222_588_229 ($\textcolor{green}{-35.05\%}$) 7_556_172 502_876 ($\textcolor{green}{-38.39\%}$) 1_090_262 ($\textcolor{green}{-36.68\%}$) 540_393 ($\textcolor{green}{-37.06\%}$)
zhenya_hashmap 148_470 989_558_312 ($\textcolor{green}{-39.96\%}$) 9_301_800 334_927 ($\textcolor{green}{-48.27\%}$) 818_203 ($\textcolor{green}{-43.51\%}$) 335_264 ($\textcolor{green}{-48.57\%}$)
btreemap_rs 463_506 ($\textcolor{red}{5.59\%}$) 111_411_886 ($\textcolor{green}{-1.12\%}$) 1_638_400 57_790 ($\textcolor{green}{-2.82\%}$) 131_160 ($\textcolor{green}{-1.44\%}$) 60_886 ($\textcolor{red}{0.62\%}$)
hashmap_rs 455_890 ($\textcolor{red}{6.40\%}$) 47_917_909 ($\textcolor{green}{-2.93\%}$) 1_835_008 17_679 ($\textcolor{green}{-9.67\%}$) 55_195 ($\textcolor{green}{-5.22\%}$) 18_200 ($\textcolor{green}{-12.52\%}$)

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 139_951 369_466_193 ($\textcolor{green}{-46.25\%}$) 1_400_024 334_365 397_474 ($\textcolor{green}{-44.02\%}$)
heap_rs 432_455 ($\textcolor{red}{6.46\%}$) 5_222_959 ($\textcolor{red}{4.97\%}$) 819_200 45_955 ($\textcolor{green}{-6.03\%}$) 18_614 ($\textcolor{green}{-9.54\%}$)

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 152_580 238_966_334 ($\textcolor{green}{-43.12\%}$) 1_820_844 543_937 ($\textcolor{green}{-51.09\%}$) 73_525_914 ($\textcolor{green}{-40.43\%}$) 518_626 ($\textcolor{green}{-50.68\%}$)
hashmap_rs 455_890 ($\textcolor{red}{6.40\%}$) 9_883_915 ($\textcolor{green}{-2.89\%}$) 950_272 17_010 ($\textcolor{green}{-10.01\%}$) 54_512 ($\textcolor{green}{-5.30\%}$) 17_117 ($\textcolor{green}{-13.32\%}$)
imrc_hashmap_rs 463_309 ($\textcolor{red}{6.44\%}$) 25_635_286 ($\textcolor{red}{34.48\%}$) 1_572_864 28_503 ($\textcolor{green}{-4.24\%}$) 149_652 ($\textcolor{red}{31.50\%}$) 36_357 ($\textcolor{green}{-1.18\%}$)
movm_rs 1_790_949 ($\textcolor{red}{1.71\%}$) 1_095_033_258 ($\textcolor{red}{9.54\%}$) 2_654_208 2_514_539 ($\textcolor{red}{3.70\%}$) 7_008_740 ($\textcolor{red}{10.24\%}$) 5_528_731 ($\textcolor{red}{10.27\%}$)
movm_dynamic_rs 1_907_020 ($\textcolor{green}{-1.89\%}$) 514_686_442 ($\textcolor{red}{5.95\%}$) 2_129_920 2_063_144 ($\textcolor{red}{8.05\%}$) 2_779_676 ($\textcolor{red}{5.20\%}$) 2_061_927 ($\textcolor{red}{8.12\%}$)

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 ($\textcolor{green}{-0.05\%}$) 37_469 ($\textcolor{green}{-6.54\%}$) 16_320 ($\textcolor{green}{-10.19\%}$) 12_656 ($\textcolor{green}{-4.03\%}$) 14_105 ($\textcolor{green}{-0.01\%}$)
Rust 759_134 ($\textcolor{red}{1.19\%}$) 471_365 ($\textcolor{green}{-5.82\%}$) 86_522 ($\textcolor{green}{-7.31\%}$) 104_060 ($\textcolor{green}{-9.50\%}$) 115_772 ($\textcolor{green}{-7.18\%}$)

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_882 12_181 22_319 4_710
Rust 833_383 ($\textcolor{red}{4.06\%}$) 124_852 ($\textcolor{green}{-7.29\%}$) 323_718 ($\textcolor{green}{-7.18\%}$) 77_282 ($\textcolor{green}{-10.97\%}$)

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 26_624 ($\textcolor{green}{-6.94\%}$) 797 ($\textcolor{green}{-3.98\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 125_168 ($\textcolor{green}{-0.05\%}$) 15_208 1_679
Rust 462_086 ($\textcolor{red}{3.27\%}$) 43_483 ($\textcolor{green}{-12.31\%}$) 7_663 ($\textcolor{green}{-19.46\%}$)

Warning
Skip table 0 ## Garbage Collection from _out/motoko/README.md, due to table shape mismatches from main branch.

Actor class

binary size put new bucket put existing bucket get
Map 254_076 ($\textcolor{green}{-0.15\%}$) 638_613 ($\textcolor{green}{-1.08\%}$) 4_449 4_909

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 139_886 ($\textcolor{green}{-0.05\%}$) 126_827 ($\textcolor{green}{-0.05\%}$) 14_632 8_451 10_530 3_662
Rust 510_614 ($\textcolor{red}{6.74\%}$) 560_223 ($\textcolor{red}{6.28\%}$) 52_071 ($\textcolor{green}{-9.67\%}$) 34_588 ($\textcolor{green}{-10.21\%}$) 74_157 ($\textcolor{green}{-8.52\%}$) 41_500 ($\textcolor{green}{-9.17\%}$)

@github-actions
Copy link

github-actions bot commented Jul 18, 2023

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 152_580 1_195_632_150 9_102_052 545_645 365_569_669 520_876
triemap 156_424 1_338_995_779 9_715_900 459_710 1_193_026 686_569
rbtree 153_258 1_115_533_975 8_902_160 354_721 964_237 495_133
splay 152_693 1_323_550_652 8_702_096 719_103 1_214_198 717_146
btree 180_227 1_222_588_229 7_556_172 502_876 1_090_262 540_393
zhenya_hashmap 148_470 989_558_312 9_301_800 334_927 818_203 335_264
btreemap_rs 463_506 111_411_886 1_638_400 57_790 131_160 60_886
hashmap_rs 455_890 47_917_909 1_835_008 17_679 55_195 18_200

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 139_951 369_466_193 1_400_024 334_365 397_474
heap_rs 432_455 5_222_959 819_200 45_955 18_614

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 152_580 238_966_334 1_820_844 543_937 73_525_914 518_626
hashmap_rs 455_890 9_883_915 950_272 17_010 54_512 17_117
imrc_hashmap_rs 463_309 25_635_286 1_572_864 28_503 149_652 36_357
movm_rs 1_790_949 1_095_033_258 2_654_208 2_514_539 7_008_740 5_528_731
movm_dynamic_rs 1_907_020 514_686_442 2_129_920 2_063_144 2_779_676 2_061_927

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 225_805 37_469 16_320 12_656 14_105
Rust 759_134 471_365 86_522 104_060 115_772

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 183_882 12_181 22_319 4_710
Rust 833_383 124_852 323_718 77_282

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 118_909 7_392
Rust 26_624 797

Timer

binary_size setTimer cancelTimer
Motoko 125_168 15_208 1_679
Rust 462_086 43_483 7_663

Motoko Specific Benchmarks

Measure various features only available in Motoko.

  • Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_live_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.

    • default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
    • copying. Compile with --force-gc --copying-gc.
    • compacting. Compile with --force-gc --compacting-gc.
    • generational. Compile with --force-gc --generational-gc.
  • Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

generate 80k max mem batch_get 50 batch_put 50 batch_remove 50
default 251_928_001 15_539_816 50 50 50
copying 251_927_951 15_539_816 251_922_212 252_077_283 252_077_615
compacting 385_346_090 15_539_816 295_775_337 354_723_987 339_091_086
generational 590_168_177 15_540_080 51_200 1_051_273 594_436
incremental 192_660_140 4_628 519_293_666 129_819_032 321_970_457

Actor class

binary size put new bucket put existing bucket get
Map 254_076 638_613 4_449 4_909

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 139_886 126_827 14_632 8_451 10_530 3_662
Rust 510_614 560_223 52_071 34_588 74_157 41_500

Copy link
Contributor

@chenyan-dfinity chenyan-dfinity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome numbers!

@ggreif ggreif merged commit 00128b6 into main Jul 19, 2023
1 check passed
@ggreif ggreif deleted the gabor/bump branch July 19, 2023 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants