Reduce TaskExecutor overhead #13861

original-brownbear · 2024-10-05T08:09:02Z

The TaskGroup class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only). Also, the tasks can be made a little more lightweight too by using a var handle instead of an atomic and being explicit about the fact that they need a reference to the list of other tasks.

Similar to #13860 a small but not insignificant improvement here + reducing the noise in this code now sets up cleaner benchmarking for potential solutions to the small task overhead problems we ran into here and there:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           BrowseMonthTaxoFacets       12.82     (23.1%)       12.65     (29.1%)   -1.3% ( -43% -   66%) 0.819
                         LowTerm      846.91      (5.6%)      846.92      (5.9%)    0.0% ( -10% -   12%) 0.999
               HighTermMonthSort     1525.55      (5.2%)     1529.85      (6.6%)    0.3% ( -10% -   12%) 0.832
                      OrHighHigh      193.59      (9.9%)      194.63      (5.3%)    0.5% ( -13% -   17%) 0.762
                     AndHighHigh      135.09      (9.6%)      135.88      (7.1%)    0.6% ( -14% -   19%) 0.756
                    OrNotHighLow     1787.14      (5.4%)     1798.09      (4.5%)    0.6% (  -8% -   11%) 0.581
                       OrHighLow      714.69      (3.6%)      720.59      (3.9%)    0.8% (  -6% -    8%) 0.325
                        HighTerm      439.29      (9.9%)      443.35      (9.2%)    0.9% ( -16% -   22%) 0.665
            BrowseDateSSDVFacets        1.24      (6.9%)        1.26      (9.2%)    1.0% ( -14% -   18%) 0.585
                         MedTerm      582.73      (7.8%)      588.53      (8.6%)    1.0% ( -14% -   18%) 0.586
                    OrHighNotLow      517.51      (7.3%)      524.34      (5.7%)    1.3% ( -10% -   15%) 0.366
                         Prefix3     1737.96      (5.1%)     1761.79      (5.8%)    1.4% (  -9% -   12%) 0.260
                       MedPhrase      121.93      (8.9%)      123.89      (7.2%)    1.6% ( -13% -   19%) 0.374
                   OrHighNotHigh      601.29      (6.1%)      611.36      (5.1%)    1.7% (  -8% -   13%) 0.180
                       OrHighMed      222.29      (3.2%)      226.23      (3.8%)    1.8% (  -5% -    9%) 0.024
                          IntNRQ      141.25      (4.9%)      143.76      (4.4%)    1.8% (  -7% -   11%) 0.089
                      AndHighLow     1629.97      (6.3%)     1659.01      (6.8%)    1.8% ( -10% -   15%) 0.226
                          Fuzzy1       78.77      (2.9%)       80.21      (3.4%)    1.8% (  -4% -    8%) 0.011
            HighTermTitleBDVSort       31.60      (4.0%)       32.18      (5.3%)    1.8% (  -7% -   11%) 0.080
         AndHighMedDayTaxoFacets       92.22      (2.6%)       93.92      (3.1%)    1.8% (  -3% -    7%) 0.004
     BrowseRandomLabelTaxoFacets        4.34      (4.2%)        4.42      (4.5%)    1.9% (  -6% -   11%) 0.054
        AndHighHighDayTaxoFacets       27.25      (3.4%)       27.80      (3.1%)    2.0% (  -4% -    8%) 0.006
                    HighSpanNear       11.45      (4.9%)       11.68      (7.3%)    2.0% (  -9% -   14%) 0.145
                        PKLookup      240.33      (2.7%)      245.34      (2.3%)    2.1% (  -2% -    7%) 0.000
                    OrHighNotMed      654.03      (6.4%)      668.14      (6.2%)    2.2% (  -9% -   15%) 0.125
                    OrNotHighMed      359.60      (4.8%)      367.38      (3.6%)    2.2% (  -5% -   11%) 0.022
               HighTermTitleSort       67.09      (4.2%)       68.63      (4.8%)    2.3% (  -6% -   11%) 0.023
            MedTermDayTaxoFacets       22.07      (3.4%)       22.63      (4.1%)    2.5% (  -4% -   10%) 0.003
          OrHighMedDayTaxoFacets        6.41      (5.0%)        6.59      (5.4%)    2.7% (  -7% -   13%) 0.020
                          Fuzzy2       83.73      (2.6%)       86.05      (2.2%)    2.8% (  -1% -    7%) 0.000
            HighIntervalsOrdered       20.50      (4.9%)       21.10      (5.5%)    2.9% (  -7% -   14%) 0.012
       BrowseDayOfYearTaxoFacets        5.13      (5.7%)        5.29      (9.3%)    3.0% ( -11% -   19%) 0.084
                HighSloppyPhrase        8.16      (5.3%)        8.41      (4.8%)    3.0% (  -6% -   13%) 0.007
                         Respell       54.78      (1.5%)       56.47      (1.5%)    3.1% (   0% -    6%) 0.000
            BrowseDateTaxoFacets        5.05      (5.9%)        5.21      (9.4%)    3.1% ( -11% -   19%) 0.079
                   OrNotHighHigh      181.76      (7.3%)      187.49      (5.6%)    3.2% (  -9% -   17%) 0.030
             LowIntervalsOrdered       17.54      (4.6%)       18.11      (4.5%)    3.3% (  -5% -   12%) 0.001
                      TermDTSort      153.46      (6.0%)      158.55      (4.9%)    3.3% (  -7% -   15%) 0.007
                     MedSpanNear       27.38      (2.1%)       28.30      (2.2%)    3.4% (   0% -    7%) 0.000
                       LowPhrase       23.26      (5.1%)       24.05      (5.4%)    3.4% (  -6% -   14%) 0.004
                      AndHighMed      176.55      (6.1%)      182.49      (5.3%)    3.4% (  -7% -   15%) 0.008
                 LowSloppyPhrase       90.89      (4.5%)       94.02      (5.0%)    3.4% (  -5% -   13%) 0.001
                     LowSpanNear       31.44      (2.2%)       32.54      (2.4%)    3.5% (  -1% -    8%) 0.000
             MedIntervalsOrdered       44.12      (3.9%)       45.69      (3.8%)    3.6% (  -4% -   11%) 0.000
                      HighPhrase       84.47      (5.7%)       87.81      (5.3%)    4.0% (  -6% -   15%) 0.001
     BrowseRandomLabelSSDVFacets        3.28      (2.7%)        3.41      (4.9%)    4.1% (  -3% -   12%) 0.000
       BrowseDayOfYearSSDVFacets        4.49     (10.1%)        4.68     (10.4%)    4.1% ( -14% -   27%) 0.073
           BrowseMonthSSDVFacets        4.39      (7.3%)        4.59      (8.0%)    4.4% ( -10% -   21%) 0.009
                 MedSloppyPhrase       28.27      (4.7%)       29.52      (7.1%)    4.4% (  -6% -   17%) 0.001
                        Wildcard      214.34      (6.1%)      224.66      (6.0%)    4.8% (  -6% -   17%) 0.000
           HighTermDayOfYearSort      323.67      (8.6%)      339.53      (7.7%)    4.9% ( -10% -   23%) 0.007

The `TaskGroup` class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only). The tasks can be made a little more lightweight too by using a var handle instead of an atomic.

javanna

left one comment about the var handle, the rest looks good to me.

javanna · 2024-10-07T08:48:04Z

lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java

+    static {
+      try {
+        STARTED_OR_CANCELLED =
+            MethodHandles.lookup().findVarHandle(Task.class, "startedOrCancelled", boolean.class);


this is the one bit that makes me wonder if it's necessary and what the right trade-off is. We had an atomic boolean before. Perhaps it deserves a separate PR.

The way I created this PR was to basically:

shave some overhead

benchmark

shave some more

benchmark more

...

Like benchmark results and open PR

I'm not sure this one in isolation will have a lot of effect and it's hard to justify in isolation. What it does is save some bytes of heap and more importantly CPU cache. If you do enough of these changes in aggregate you get a couple % improvement in the benchmark. But each individual one obviously is trivial :)

That said, the JDK itself uses this pattern in e.g. the FutureTask and to me the tradeoff seems worth it. It simply makes it more predictable what the code will compile to and at least we some guarantee that the byte for the boolean will be located right near the other mutable state fields in the object which is a good thing when they are so closely related logically and saves some cache thrashing :)
For the basic building blocks of our concurrency logic like this one I'd always err on the side of performance. Effectively it's 5 extra lines of code traded for the guarantee that the flag is flattened into the task instance and won't be moved away from it by the GC even if the task waits in the queue for 5s ever :) Seems like a good deal to me.

Happy to put this in a separate PR though, but maybe then it'd be worthwhile to make this kind of adjustment in multiple places at once? (I haven't checked though, maybe there aren't that many actually)

Your flow makes sense from a benchmarking perspective but there's also the code style perspective, what makes sense in a single PR, what parts of the code it touches. I find that this particular change could be highlighted and get more attention. That also helps because the rest of changes in this PR are low hanging fruit mechanical changes that are easily reviewed.

Sure thing backed this one out for now :)

javanna

LGTM, please remember to backport to 10x

original-brownbear · 2024-10-08T09:04:25Z

Thanks Luca! And sure thing will backport :)!

The `TaskGroup` class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only).

Reduce TaskExecutor overhead

e951066

The `TaskGroup` class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only). The tasks can be made a little more lightweight too by using a var handle instead of an atomic.

javanna reviewed Oct 7, 2024

View reviewed changes

original-brownbear requested a review from javanna October 7, 2024 13:56

original-brownbear added 2 commits October 7, 2024 16:58

Merge remote-tracking branch 'apache/main' into cheaper-taskexecutor

d2ba0a7

back out varhandle change

24e271e

javanna approved these changes Oct 8, 2024

View reviewed changes

javanna added this to the 10.1.0 milestone Oct 8, 2024

original-brownbear merged commit 7757d43 into apache:main Oct 8, 2024
3 checks passed

original-brownbear deleted the cheaper-taskexecutor branch October 8, 2024 09:04

original-brownbear added a commit that referenced this pull request Oct 8, 2024

Reduce TaskExecutor overhead (#13861)

22638ec

The `TaskGroup` class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce TaskExecutor overhead #13861

Reduce TaskExecutor overhead #13861

original-brownbear commented Oct 5, 2024

javanna left a comment

javanna Oct 7, 2024

original-brownbear Oct 7, 2024

original-brownbear Oct 7, 2024

javanna Oct 7, 2024

original-brownbear Oct 7, 2024

javanna left a comment

original-brownbear commented Oct 8, 2024

Reduce TaskExecutor overhead #13861

Reduce TaskExecutor overhead #13861

Conversation

original-brownbear commented Oct 5, 2024

javanna left a comment

Choose a reason for hiding this comment

javanna Oct 7, 2024

Choose a reason for hiding this comment

original-brownbear Oct 7, 2024

Choose a reason for hiding this comment

original-brownbear Oct 7, 2024

Choose a reason for hiding this comment

javanna Oct 7, 2024

Choose a reason for hiding this comment

original-brownbear Oct 7, 2024

Choose a reason for hiding this comment

javanna left a comment

Choose a reason for hiding this comment

original-brownbear commented Oct 8, 2024