Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add taskflow simple for benchmark #38

Closed

Conversation

andre-nguyen
Copy link
Contributor

@andre-nguyen andre-nguyen commented Oct 26, 2024

PR Details

Added taskflow to the simple_for benchmark as a starting point to compare dispenso to taskflow.

Description

Only added the simple for loop without digging down into static chunking which taskflow does support.

Related Issue

#37

Motivation and Context

This helps do a comparison between dispenso and taskflow

Test Plan

I ran the benchmark on my laptop (without tbb). Although I have some doubts about the results. It seems that the wall time is sometimes many times larger than the cpu time. I'm not sure how that's possible and why taskflow stands out like that.

Run on (16 X 5300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 1.37, 2.04, 1.89
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------------
Benchmark                                                Time             CPU   Iterations
------------------------------------------------------------------------------------------
BM_serial<kSmallSize>                                  299 ns          298 ns      2345258
BM_serial<kMediumSize>                              306672 ns       306668 ns         2283
BM_serial<kLargeSize>                             50751161 ns     50747614 ns           14
BM_omp/1/1000/real_time                                625 ns          625 ns      1121720
BM_omp/2/1000/real_time                                693 ns          692 ns      1004955
BM_omp/3/1000/real_time                                737 ns          736 ns       960587
BM_omp/4/1000/real_time                                803 ns          803 ns       865631
BM_omp/6/1000/real_time                                925 ns          924 ns       748214
BM_omp/8/1000/real_time                               1111 ns         1111 ns       606568
BM_omp/12/1000/real_time                              1454 ns         1454 ns       483545
BM_omp/16/1000/real_time                              2004 ns         2003 ns       416154
BM_omp/1/1000000/real_time                          306667 ns       306656 ns         2290
BM_omp/2/1000000/real_time                          150845 ns       150845 ns         4607
BM_omp/3/1000000/real_time                          110260 ns       110250 ns         6435
BM_omp/4/1000000/real_time                           77304 ns        77303 ns         8906
BM_omp/6/1000000/real_time                           67282 ns        67280 ns        11342
BM_omp/8/1000000/real_time                           50377 ns        50375 ns        10000
BM_omp/12/1000000/real_time                          52451 ns        52446 ns        13500
BM_omp/16/1000000/real_time                          53647 ns        53430 ns        13567
BM_omp/1/100000000/real_time                      48317821 ns     48316100 ns           14
BM_omp/2/100000000/real_time                      39451660 ns     39450544 ns           18
BM_omp/3/100000000/real_time                      39330200 ns     39325490 ns           18
BM_omp/4/100000000/real_time                      38954663 ns     38954478 ns           18
BM_omp/6/100000000/real_time                      38727164 ns     38726614 ns           18
BM_omp/8/100000000/real_time                      39024963 ns     39018519 ns           18
BM_omp/12/100000000/real_time                     40647437 ns     40645516 ns           17
BM_omp/16/100000000/real_time                     44954963 ns     44738945 ns           16
BM_taskflow/1/1000/real_time                        690331 ns        40025 ns         3228
BM_taskflow/2/1000/real_time                        562607 ns        20827 ns         1000
BM_taskflow/3/1000/real_time                        702859 ns        26625 ns         1482
BM_taskflow/4/1000/real_time                        697008 ns        26508 ns         1567
BM_taskflow/6/1000/real_time                        706136 ns        30990 ns         1639
BM_taskflow/8/1000/real_time                        703104 ns        30894 ns         1467
BM_taskflow/12/1000/real_time                       542679 ns        22294 ns         1000
BM_taskflow/16/1000/real_time                       584175 ns        21946 ns         1000
BM_taskflow/1/1000000/real_time                   15727915 ns        12647 ns          100
BM_taskflow/2/1000000/real_time                   22169120 ns        13370 ns          100
BM_taskflow/3/1000000/real_time                   14379920 ns        12034 ns          100
BM_taskflow/4/1000000/real_time                   10774650 ns        12825 ns          100
BM_taskflow/6/1000000/real_time                    7211330 ns        14700 ns          100
BM_taskflow/8/1000000/real_time                    5428211 ns         8932 ns          100
BM_taskflow/12/1000000/real_time                   5305966 ns         9360 ns          100
BM_taskflow/16/1000000/real_time                   5293688 ns        12203 ns          100
BM_taskflow/1/100000000/real_time                357571202 ns        12825 ns           13
BM_taskflow/2/100000000/real_time                263788288 ns        13637 ns           10
BM_taskflow/3/100000000/real_time                190066384 ns        14261 ns           10
BM_taskflow/4/100000000/real_time                172124922 ns        15297 ns           10
BM_taskflow/6/100000000/real_time                152062930 ns        15786 ns           10
BM_taskflow/8/100000000/real_time                122660533 ns        13034 ns           10
BM_taskflow/12/100000000/real_time               167873809 ns        13919 ns           10
BM_taskflow/16/100000000/real_time               166985367 ns        12519 ns           10
BM_dispenso/1/1000/real_time                           318 ns          318 ns      2206473
BM_dispenso/2/1000/real_time                           316 ns          316 ns      2211333
BM_dispenso/3/1000/real_time                           320 ns          319 ns      2142115
BM_dispenso/4/1000/real_time                           315 ns          315 ns      2245900
BM_dispenso/6/1000/real_time                           328 ns          328 ns      2216790
BM_dispenso/8/1000/real_time                           316 ns          316 ns      2212754
BM_dispenso/12/1000/real_time                          320 ns          320 ns      2183386
BM_dispenso/16/1000/real_time                          319 ns          319 ns      2214883
BM_dispenso/1/1000000/real_time                     303939 ns       303934 ns         2304
BM_dispenso/2/1000000/real_time                     157537 ns       157533 ns         4566
BM_dispenso/3/1000000/real_time                     112913 ns       112910 ns         6162
BM_dispenso/4/1000000/real_time                      83554 ns        83552 ns         8280
BM_dispenso/6/1000000/real_time                      68707 ns        68706 ns        11231
BM_dispenso/8/1000000/real_time                      70007 ns        70005 ns         9152
BM_dispenso/12/1000000/real_time                     59267 ns        59247 ns        11914
BM_dispenso/16/1000000/real_time                     56938 ns        56513 ns        10627
BM_dispenso/1/100000000/real_time                 49513503 ns     49509393 ns           14
BM_dispenso/2/100000000/real_time                 40229395 ns     40227214 ns           17
BM_dispenso/3/100000000/real_time                 39830898 ns     39829463 ns           18
BM_dispenso/4/100000000/real_time                 43033021 ns     42995019 ns           17
BM_dispenso/6/100000000/real_time                 40570386 ns     40567394 ns           17
BM_dispenso/8/100000000/real_time                 40126609 ns     40109500 ns           18
BM_dispenso/12/100000000/real_time                41332431 ns     41331276 ns           17
BM_dispenso/16/100000000/real_time                44230711 ns     44221647 ns           16
BM_dispenso_static_chunk/1/1000/real_time              317 ns          317 ns      2210145
BM_dispenso_static_chunk/2/1000/real_time              321 ns          321 ns      2185177
BM_dispenso_static_chunk/3/1000/real_time              317 ns          317 ns      2210167
BM_dispenso_static_chunk/4/1000/real_time              317 ns          317 ns      2180981
BM_dispenso_static_chunk/6/1000/real_time              320 ns          320 ns      2189496
BM_dispenso_static_chunk/8/1000/real_time              318 ns          318 ns      2185313
BM_dispenso_static_chunk/12/1000/real_time             317 ns          317 ns      2207560
BM_dispenso_static_chunk/16/1000/real_time             339 ns          339 ns      2184940
BM_dispenso_static_chunk/1/1000000/real_time        168916 ns       168915 ns         3532
BM_dispenso_static_chunk/2/1000000/real_time        120111 ns       120100 ns         6067
BM_dispenso_static_chunk/3/1000000/real_time         82673 ns        82671 ns         8135
BM_dispenso_static_chunk/4/1000000/real_time         82241 ns        82237 ns         9997
BM_dispenso_static_chunk/6/1000000/real_time         63839 ns        63835 ns         8577
BM_dispenso_static_chunk/8/1000000/real_time         77895 ns        77647 ns         9068
BM_dispenso_static_chunk/12/1000000/real_time        57716 ns        57691 ns        12285
BM_dispenso_static_chunk/16/1000000/real_time        71068 ns        69264 ns        10441
BM_dispenso_static_chunk/1/100000000/real_time    46261474 ns     46260838 ns           17
BM_dispenso_static_chunk/2/100000000/real_time    40506077 ns     40505466 ns           17
BM_dispenso_static_chunk/3/100000000/real_time    40088773 ns     40088273 ns           17
BM_dispenso_static_chunk/4/100000000/real_time    39852701 ns     39852498 ns           17
BM_dispenso_static_chunk/6/100000000/real_time    40631868 ns     40631674 ns           18
BM_dispenso_static_chunk/8/100000000/real_time    40360283 ns     40359194 ns           17
BM_dispenso_static_chunk/12/100000000/real_time   41753740 ns     41741086 ns           17
BM_dispenso_static_chunk/16/100000000/real_time   42931928 ns     42916115 ns           16
BM_dispenso_auto_chunk/1/1000/real_time                320 ns          320 ns      2186150
BM_dispenso_auto_chunk/2/1000/real_time                318 ns          318 ns      2192645
BM_dispenso_auto_chunk/3/1000/real_time                317 ns          317 ns      2211177
BM_dispenso_auto_chunk/4/1000/real_time                317 ns          317 ns      2187290
BM_dispenso_auto_chunk/6/1000/real_time                317 ns          317 ns      2207199
BM_dispenso_auto_chunk/8/1000/real_time                319 ns          319 ns      2194044
BM_dispenso_auto_chunk/12/1000/real_time               317 ns          317 ns      2193505
BM_dispenso_auto_chunk/16/1000/real_time               319 ns          319 ns      2199002
BM_dispenso_auto_chunk/1/1000000/real_time          303674 ns       303664 ns         2233
BM_dispenso_auto_chunk/2/1000000/real_time          150620 ns       150618 ns         4501
BM_dispenso_auto_chunk/3/1000000/real_time          117971 ns       117966 ns         6058
BM_dispenso_auto_chunk/4/1000000/real_time           83675 ns        83546 ns         8366
BM_dispenso_auto_chunk/6/1000000/real_time           73060 ns        73057 ns        10901
BM_dispenso_auto_chunk/8/1000000/real_time           75587 ns        75585 ns         9842
BM_dispenso_auto_chunk/12/1000000/real_time          62967 ns        62935 ns        11286
BM_dispenso_auto_chunk/16/1000000/real_time          58583 ns        58006 ns         9722
BM_dispenso_auto_chunk/1/100000000/real_time      49875526 ns     49800454 ns           14
BM_dispenso_auto_chunk/2/100000000/real_time      40005353 ns     39918262 ns           18
BM_dispenso_auto_chunk/3/100000000/real_time      40785224 ns     40783565 ns           17
BM_dispenso_auto_chunk/4/100000000/real_time      40189175 ns     40188483 ns           17
BM_dispenso_auto_chunk/6/100000000/real_time      40171003 ns     40170374 ns           16
BM_dispenso_auto_chunk/8/100000000/real_time      39819649 ns     39818894 ns           17
BM_dispenso_auto_chunk/12/100000000/real_time     42484898 ns     42480208 ns           17
BM_dispenso_auto_chunk/16/100000000/real_time     43274232 ns     42079591 ns           16

Process finished with exit code 0

Types of changes

  • Docs change
  • Refactoring
  • Dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • I have run clang-format.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed, including in ASAN and TSAN modes (if available on your platform).

@facebook-github-bot
Copy link
Contributor

Hi @andre-nguyen!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@graphicsMan
Copy link
Contributor

Hi @andre-nguyen . Thanks very much for contributing! As stated above, please complete the CLA, and I'll make sure to get this reviewed and merged.

RE: Why taskflow CPU seems so low, while wallclock is high: CPU is for the main-thread CPU time only. It seems that taskflow mostly just has that thread wait while the work is done in the thread pool.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 28, 2024
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@facebook-github-bot
Copy link
Contributor

@graphicsMan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@graphicsMan merged this pull request in bdef795.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants