Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve performance, reduce allocations, and avoid promises #102

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

negezor
Copy link

@negezor negezor commented Jul 5, 2024

Hi there! I love micro-optimizations. During one of my profiling sessions of my application on low-end devices, I noticed a significant consumption of hookable on the flame graph. After reviewing the code, I identified some possible improvements, which I’ve included in this PR. I’ve added benchmarks to support my claims. Here are some of the improvements:

  1. Instead of modifying the arguments through unshift, we can directly pass name to callHookWith, as all the arguments there are collected using the spread operator. This removes unnecessary allocations and improves performance.
  2. We should avoid Promise as much as possible because it is a performance killer. I replaced the Promise chain with a middleware pattern in serialTaskCaller.
  3. Early returns in serialTaskCaller and parallelTaskCaller allow us to avoid unnecessary operations if there are no hooks. However, this breaks the test for createDebugger as it returns the hook name in the arguments. I need consultation on this matter, as skipping the step significantly boosts performance. How should we proceed?
  4. Using delete negatively impacts performance as it de-optimizes the object. A better option is to set the property to undefined. The only downside is that if we print the _hooks object, we’ll see something like { "hello": undefined }.
  5. Removing Object.assign in deprecateHooks. Since we call the deprecateHook method anyway, nothing will be missed.
  6. In addHooks, we can set an empty array instead of calling splice in the release function each time. Plus, this approach uses fewer characters.
  7. In removeAllHooks, we can simply set an empty object and let the GC do the work. The only justification for the current behavior might be to keep a reference to the old object, but I don’t see any objective reasons for this.
  8. In removeHook, we can create an alias that slightly reduces the final bundle size and improves performance, as we don't need to compute the property.

There are also potential performance improvements, such as:

  • In callHookWith, we could skip the caller call if there are no hooks to call, but this is a major change, and I’m unsure about it. This could eliminate the need for the third optimization.
  • A good option would be to avoid args.shift() in utility functions.

I hope the entire Vue ecosystem will benefit greatly from these changes. If I'm wrong about anything, please correct me 😅

Performance was tested on the following hardware:
CPU: AMD 7950x3D
System: Arch Linux on Windows 11, WSL 2
Node.js: v22.4.0

Before
 ✓ test/hookable.bench.ts (30) 40627ms
   ✓ empty serialTaskCaller (3) 39618ms
     name                                              hz     min      max    mean     p75     p99    p995    p999     rme  samples
   · empty serialTaskCaller                 10,182,469.12  0.0001   0.5928  0.0001  0.0001  0.0002  0.0004  0.0005  ±1.10%  5091235   fastest
   · empty serialTaskCaller with argument    9,965,174.09  0.0001   0.5196  0.0001  0.0001  0.0002  0.0003  0.0005  ±0.27%  4982588  
   · empty serialTaskCaller with arguments   8,588,290.33  0.0001  19.4391  0.0001  0.0001  0.0004  0.0004  0.0012  ±7.62%  4294146   slowest
   ✓ serialTaskCaller (3) 33950ms
     name                                       hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · serialTaskCaller                 2,354,705.43  0.0003  0.3582  0.0004  0.0004  0.0008  0.0009  0.0013  ±0.82%  1177353  
   · serialTaskCaller with argument   2,467,112.19  0.0003  0.7581  0.0004  0.0004  0.0008  0.0009  0.0012  ±0.62%  1233557   fastest
   · serialTaskCaller with arguments  1,799,070.06  0.0003  1.1227  0.0006  0.0004  0.0009  0.0010  0.0013  ±3.48%   899539   slowest
   ✓ empty parallelTaskCaller (3) 40624ms
     name                                               hz     min      max    mean     p75     p99    p995    p999      rme  samples
   · empty parallelTaskCaller                 4,975,588.58  0.0001   0.5751  0.0002  0.0002  0.0005  0.0006  0.0008   ±1.18%  2487795   fastest
   · empty parallelTaskCaller with argument   4,067,468.90  0.0001   7.5485  0.0002  0.0002  0.0005  0.0006  0.0011   ±6.10%  2033879  
   · empty parallelTaskCaller with arguments  3,260,399.92  0.0001  10.4565  0.0003  0.0002  0.0006  0.0009  0.0022  ±10.64%  1630200   slowest
   ✓ parallelTaskCaller (3) 37518ms
     name                                         hz     min      max    mean     p75     p99    p995    p999      rme  samples
   · parallelTaskCaller                 1,208,877.28  0.0006   0.4159  0.0008  0.0009  0.0014  0.0015  0.0025   ±0.80%   604440   fastest
   · parallelTaskCaller with argument     851,025.12  0.0006  31.1798  0.0012  0.0010  0.0018  0.0029  0.0055  ±13.90%   425513  
   · parallelTaskCaller with arguments    848,773.69  0.0007   3.5288  0.0012  0.0009  0.0015  0.0017  0.0035   ±5.44%   424387   slowest
   ✓ empty callHook (3) 35453ms
     name                                          hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · empty callHook                      3,324,678.32  0.0002  0.3023  0.0003  0.0003  0.0006  0.0007  0.0010  ±0.89%  1662340   fastest
   · empty callHook with argument        2,558,739.92  0.0002  2.0208  0.0004  0.0003  0.0006  0.0007  0.0011  ±4.25%  1280209  
   · empty callHook with five arguments  2,414,266.77  0.0002  2.3941  0.0004  0.0003  0.0006  0.0007  0.0011  ±4.88%  1207134   slowest
   ✓ empty callHookParallel (3) 36265ms
     name                                                  hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · empty callHookParallel                      2,155,449.37  0.0003  0.4095  0.0005  0.0004  0.0008  0.0009  0.0014  ±0.95%  1077725   fastest
   · empty callHookParallel with argument        1,756,949.99  0.0004  1.2595  0.0006  0.0004  0.0009  0.0009  0.0016  ±3.44%   878475  
   · empty callHookParallel with five arguments  1,719,582.79  0.0004  2.8272  0.0006  0.0004  0.0008  0.0009  0.0015  ±4.62%   859793   slowest
   ✓ callHook (3) 34625ms
     name                                    hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · callHook                      1,346,819.92  0.0005  0.3296  0.0007  0.0007  0.0012  0.0014  0.0024  ±0.81%   673410   fastest
   · callHook with argument        1,076,396.49  0.0005  1.2023  0.0009  0.0008  0.0013  0.0014  0.0029  ±3.40%   538199  
   · callHook with five arguments  1,016,019.73  0.0006  1.5434  0.0010  0.0008  0.0013  0.0014  0.0024  ±4.28%   508010   slowest
   ✓ callHookParallel (3) 36871ms
     name                                          hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · callHookParallel                      831,923.92  0.0009  0.4366  0.0012  0.0012  0.0018  0.0020  0.0046  ±1.13%   415963   fastest
   · callHookParallel with argument        681,418.62  0.0009  1.9961  0.0015  0.0013  0.0020  0.0023  0.0048  ±3.88%   340710  
   · callHookParallel with five arguments  605,805.38  0.0009  2.5282  0.0017  0.0013  0.0019  0.0021  0.0055  ±5.52%   302903   slowest
   ✓ hook (2) 29333ms
     name                           hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · hook                 1,809,695.67  0.0003  5.3687  0.0006  0.0004  0.0007  0.0014  0.0025  ±8.52%   904848  
   · hook with deprecate  1,910,707.06  0.0003  5.7345  0.0005  0.0004  0.0007  0.0010  0.0021  ±8.02%   955354   fastest
   ✓ addHooks (1) 10517ms
     name                hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · addHooks  1,802,287.39  0.0004  8.8805  0.0006  0.0005  0.0008  0.0009  0.0015  ±5.31%   901144   fastest
   ✓ empty removeHook (1) 19763ms
     name                         hz     min     max    mean     p75     p99    p995    p999     rme   samples
   · empty removeHook  27,060,455.89  0.0000  0.9647  0.0000  0.0000  0.0000  0.0001  0.0002  ±0.41%  13530228   fastest
   ✓ removeHook (2) 33179ms
     name                              hz     min     max    mean     p75     p99    p995    p999     rme   samples
   · removeHook             24,541,013.31  0.0000  0.0635  0.0000  0.0000  0.0001  0.0001  0.0003  ±0.13%  12270507   fastest
   · removeHook with extra  24,464,448.73  0.0000  1.0412  0.0000  0.0000  0.0001  0.0001  0.0002  ±0.43%  12232225  
After
 ✓ test/hookable.bench.ts (30) 67998ms
   ✓ empty serialTaskCaller (3) 67996ms
     name                                              hz     min     max    mean     p75     p99    p995    p999     rme   samples
   · empty serialTaskCaller                 21,718,024.22  0.0000  0.0998  0.0000  0.0000  0.0001  0.0001  0.0003  ±0.14%  10859013
   · empty serialTaskCaller with argument   21,886,593.12  0.0000  0.0895  0.0000  0.0001  0.0001  0.0001  0.0002  ±0.11%  10943297   fastest
   · empty serialTaskCaller with arguments  21,578,860.84  0.0000  0.0727  0.0000  0.0001  0.0001  0.0001  0.0002  ±0.11%  10789431   slowest
   ✓ serialTaskCaller (3) 55928ms
     name                                       hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · serialTaskCaller                 5,263,373.05  0.0001  0.6182  0.0002  0.0002  0.0004  0.0005  0.0008  ±1.82%  2631687   fastest
   · serialTaskCaller with argument   4,899,317.81  0.0001  0.9551  0.0002  0.0002  0.0004  0.0005  0.0007  ±2.32%  2449659
   · serialTaskCaller with arguments  4,829,209.59  0.0002  1.4443  0.0002  0.0002  0.0004  0.0005  0.0007  ±2.07%  2414605   slowest
   ✓ empty parallelTaskCaller (3) 64743ms
     name                                                hz     min     max    mean     p75     p99    p995    p999     rme   samples
   · empty parallelTaskCaller                 21,074,124.31  0.0000  0.0936  0.0000  0.0001  0.0001  0.0001  0.0002  ±0.15%  10537063
   · empty parallelTaskCaller with argument   21,614,792.23  0.0000  0.0801  0.0000  0.0001  0.0001  0.0001  0.0003  ±0.12%  10807397   fastest
   · empty parallelTaskCaller with arguments  20,499,101.75  0.0000  1.5945  0.0000  0.0001  0.0001  0.0001  0.0002  ±0.64%  10249552   slowest
   ✓ parallelTaskCaller (3) 53768ms
     name                                         hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · parallelTaskCaller                 1,248,716.38  0.0006  0.2534  0.0008  0.0009  0.0012  0.0013  0.0020  ±0.50%   624359   fastest
   · parallelTaskCaller with argument   1,142,040.47  0.0006  0.7628  0.0009  0.0008  0.0012  0.0013  0.0026  ±2.21%   571021
   · parallelTaskCaller with arguments  1,045,581.42  0.0006  2.1857  0.0010  0.0008  0.0012  0.0014  0.0028  ±3.35%   522791   slowest
   ✓ empty callHook (3) 58239ms
     name                                           hz     min      max    mean     p75     p99    p995    p999      rme  samples
   · empty callHook                      13,264,257.60  0.0000   1.4190  0.0001  0.0001  0.0001  0.0001  0.0004   ±3.64%  6632129
   · empty callHook with argument        18,871,312.11  0.0000   0.0529  0.0001  0.0001  0.0001  0.0001  0.0003   ±0.11%  9435657   fastest
   · empty callHook with five arguments  12,472,483.13  0.0000  11.9884  0.0001  0.0001  0.0001  0.0002  0.0005  ±10.83%  6236242   slowest
   ✓ empty callHookParallel (3) 61137ms
     name                                                   hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · empty callHookParallel                      17,993,167.78  0.0000  0.5364  0.0001  0.0001  0.0001  0.0001  0.0004  ±0.32%  8996584   fastest
   · empty callHookParallel with argument        14,582,408.82  0.0000  2.2001  0.0001  0.0001  0.0001  0.0001  0.0004  ±3.57%  7298139   slowest
   · empty callHookParallel with five arguments  17,755,090.97  0.0000  0.8180  0.0001  0.0001  0.0001  0.0001  0.0004  ±0.35%  8877546
   ✓ callHook (3) 54742ms
     name                                    hz     min      max    mean     p75     p99    p995    p999      rme  samples
   · callHook                      4,098,577.78  0.0002   2.8716  0.0002  0.0002  0.0005  0.0006  0.0009   ±2.59%  2049289   fastest
   · callHook with argument        3,925,879.71  0.0002   1.7039  0.0003  0.0002  0.0005  0.0005  0.0007   ±2.96%  1962940
   · callHook with five arguments  2,615,867.56  0.0002  11.8876  0.0004  0.0002  0.0006  0.0009  0.0026  ±13.11%  1307934   slowest
   ✓ callHookParallel (3) 53109ms
     name                                          hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · callHookParallel                      984,106.82  0.0008  0.3530  0.0010  0.0010  0.0015  0.0017  0.0042  ±0.75%   492092   fastest
   · callHookParallel with argument        914,175.54  0.0008  0.7117  0.0011  0.0010  0.0015  0.0016  0.0042  ±2.13%   457088
   · callHookParallel with five arguments  822,263.16  0.0008  1.9657  0.0012  0.0010  0.0015  0.0017  0.0046  ±3.48%   411132   slowest
   ✓ hook (2) 32000ms
     name                           hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · hook                 1,789,657.75  0.0003  5.2691  0.0006  0.0004  0.0009  0.0016  0.0026  ±7.30%   894829   fastest
   · hook with deprecate  1,772,630.95  0.0003  9.5252  0.0006  0.0004  0.0007  0.0011  0.0024  ±9.79%   886316
   ✓ addHooks (1) 4528ms
     name                hz     min      max    mean     p75     p99    p995    p999     rme  samples
   · addHooks  1,879,948.50  0.0004  10.4049  0.0005  0.0005  0.0008  0.0009  0.0013  ±5.74%   939975   fastest
   ✓ empty removeHook (1) 29513ms
     name                         hz     min     max    mean     p75     p99    p995    p999     rme   samples
   · empty removeHook  27,107,740.59  0.0000  0.1260  0.0000  0.0000  0.0000  0.0001  0.0002  ±0.15%  13553871   fastest
   ✓ removeHook (2) 52477ms
     name                              hz     min     max    mean     p75     p99    p995    p999     rme   samples
   · removeHook             25,901,773.12  0.0000  0.0860  0.0000  0.0000  0.0001  0.0001  0.0002  ±0.13%  12950887
   · removeHook with extra  26,255,315.21  0.0000  1.1736  0.0000  0.0000  0.0001  0.0001  0.0002  ±0.47%  13127658   fastest

@negezor
Copy link
Author

negezor commented Jul 8, 2024

@pi0 can you review this?

@pi0
Copy link
Member

pi0 commented Jul 8, 2024

Thanks for your efforts on this dear @negezor it looks all good improvements ❤️ Sure i will carefully review once i had time (please read this for some understanding)

src/hookable.ts Outdated

// Splice will ensure that all fns are called once, and free all
// unreg functions from memory.
removeFns = [];
Copy link

@kurtextrem kurtextrem Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeFns.length = 0 is probably enough to trigger GC and avoids another alloc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants