Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce TaskExecutor overhead #13861

Merged

Conversation

original-brownbear
Copy link
Member

The TaskGroup class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only). Also, the tasks can be made a little more lightweight too by using a var handle instead of an atomic and being explicit about the fact that they need a reference to the list of other tasks.

Similar to #13860 a small but not insignificant improvement here + reducing the noise in this code now sets up cleaner benchmarking for potential solutions to the small task overhead problems we ran into here and there:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           BrowseMonthTaxoFacets       12.82     (23.1%)       12.65     (29.1%)   -1.3% ( -43% -   66%) 0.819
                         LowTerm      846.91      (5.6%)      846.92      (5.9%)    0.0% ( -10% -   12%) 0.999
               HighTermMonthSort     1525.55      (5.2%)     1529.85      (6.6%)    0.3% ( -10% -   12%) 0.832
                      OrHighHigh      193.59      (9.9%)      194.63      (5.3%)    0.5% ( -13% -   17%) 0.762
                     AndHighHigh      135.09      (9.6%)      135.88      (7.1%)    0.6% ( -14% -   19%) 0.756
                    OrNotHighLow     1787.14      (5.4%)     1798.09      (4.5%)    0.6% (  -8% -   11%) 0.581
                       OrHighLow      714.69      (3.6%)      720.59      (3.9%)    0.8% (  -6% -    8%) 0.325
                        HighTerm      439.29      (9.9%)      443.35      (9.2%)    0.9% ( -16% -   22%) 0.665
            BrowseDateSSDVFacets        1.24      (6.9%)        1.26      (9.2%)    1.0% ( -14% -   18%) 0.585
                         MedTerm      582.73      (7.8%)      588.53      (8.6%)    1.0% ( -14% -   18%) 0.586
                    OrHighNotLow      517.51      (7.3%)      524.34      (5.7%)    1.3% ( -10% -   15%) 0.366
                         Prefix3     1737.96      (5.1%)     1761.79      (5.8%)    1.4% (  -9% -   12%) 0.260
                       MedPhrase      121.93      (8.9%)      123.89      (7.2%)    1.6% ( -13% -   19%) 0.374
                   OrHighNotHigh      601.29      (6.1%)      611.36      (5.1%)    1.7% (  -8% -   13%) 0.180
                       OrHighMed      222.29      (3.2%)      226.23      (3.8%)    1.8% (  -5% -    9%) 0.024
                          IntNRQ      141.25      (4.9%)      143.76      (4.4%)    1.8% (  -7% -   11%) 0.089
                      AndHighLow     1629.97      (6.3%)     1659.01      (6.8%)    1.8% ( -10% -   15%) 0.226
                          Fuzzy1       78.77      (2.9%)       80.21      (3.4%)    1.8% (  -4% -    8%) 0.011
            HighTermTitleBDVSort       31.60      (4.0%)       32.18      (5.3%)    1.8% (  -7% -   11%) 0.080
         AndHighMedDayTaxoFacets       92.22      (2.6%)       93.92      (3.1%)    1.8% (  -3% -    7%) 0.004
     BrowseRandomLabelTaxoFacets        4.34      (4.2%)        4.42      (4.5%)    1.9% (  -6% -   11%) 0.054
        AndHighHighDayTaxoFacets       27.25      (3.4%)       27.80      (3.1%)    2.0% (  -4% -    8%) 0.006
                    HighSpanNear       11.45      (4.9%)       11.68      (7.3%)    2.0% (  -9% -   14%) 0.145
                        PKLookup      240.33      (2.7%)      245.34      (2.3%)    2.1% (  -2% -    7%) 0.000
                    OrHighNotMed      654.03      (6.4%)      668.14      (6.2%)    2.2% (  -9% -   15%) 0.125
                    OrNotHighMed      359.60      (4.8%)      367.38      (3.6%)    2.2% (  -5% -   11%) 0.022
               HighTermTitleSort       67.09      (4.2%)       68.63      (4.8%)    2.3% (  -6% -   11%) 0.023
            MedTermDayTaxoFacets       22.07      (3.4%)       22.63      (4.1%)    2.5% (  -4% -   10%) 0.003
          OrHighMedDayTaxoFacets        6.41      (5.0%)        6.59      (5.4%)    2.7% (  -7% -   13%) 0.020
                          Fuzzy2       83.73      (2.6%)       86.05      (2.2%)    2.8% (  -1% -    7%) 0.000
            HighIntervalsOrdered       20.50      (4.9%)       21.10      (5.5%)    2.9% (  -7% -   14%) 0.012
       BrowseDayOfYearTaxoFacets        5.13      (5.7%)        5.29      (9.3%)    3.0% ( -11% -   19%) 0.084
                HighSloppyPhrase        8.16      (5.3%)        8.41      (4.8%)    3.0% (  -6% -   13%) 0.007
                         Respell       54.78      (1.5%)       56.47      (1.5%)    3.1% (   0% -    6%) 0.000
            BrowseDateTaxoFacets        5.05      (5.9%)        5.21      (9.4%)    3.1% ( -11% -   19%) 0.079
                   OrNotHighHigh      181.76      (7.3%)      187.49      (5.6%)    3.2% (  -9% -   17%) 0.030
             LowIntervalsOrdered       17.54      (4.6%)       18.11      (4.5%)    3.3% (  -5% -   12%) 0.001
                      TermDTSort      153.46      (6.0%)      158.55      (4.9%)    3.3% (  -7% -   15%) 0.007
                     MedSpanNear       27.38      (2.1%)       28.30      (2.2%)    3.4% (   0% -    7%) 0.000
                       LowPhrase       23.26      (5.1%)       24.05      (5.4%)    3.4% (  -6% -   14%) 0.004
                      AndHighMed      176.55      (6.1%)      182.49      (5.3%)    3.4% (  -7% -   15%) 0.008
                 LowSloppyPhrase       90.89      (4.5%)       94.02      (5.0%)    3.4% (  -5% -   13%) 0.001
                     LowSpanNear       31.44      (2.2%)       32.54      (2.4%)    3.5% (  -1% -    8%) 0.000
             MedIntervalsOrdered       44.12      (3.9%)       45.69      (3.8%)    3.6% (  -4% -   11%) 0.000
                      HighPhrase       84.47      (5.7%)       87.81      (5.3%)    4.0% (  -6% -   15%) 0.001
     BrowseRandomLabelSSDVFacets        3.28      (2.7%)        3.41      (4.9%)    4.1% (  -3% -   12%) 0.000
       BrowseDayOfYearSSDVFacets        4.49     (10.1%)        4.68     (10.4%)    4.1% ( -14% -   27%) 0.073
           BrowseMonthSSDVFacets        4.39      (7.3%)        4.59      (8.0%)    4.4% ( -10% -   21%) 0.009
                 MedSloppyPhrase       28.27      (4.7%)       29.52      (7.1%)    4.4% (  -6% -   17%) 0.001
                        Wildcard      214.34      (6.1%)      224.66      (6.0%)    4.8% (  -6% -   17%) 0.000
           HighTermDayOfYearSort      323.67      (8.6%)      339.53      (7.7%)    4.9% ( -10% -   23%) 0.007

The `TaskGroup` class is redundant, the futures list can be a local variable
shared by the tasks (this also removes any need for making it read-only).
The tasks can be made a little more lightweight too by using a var handle
instead of an atomic.
Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one comment about the var handle, the rest looks good to me.

static {
try {
STARTED_OR_CANCELLED =
MethodHandles.lookup().findVarHandle(Task.class, "startedOrCancelled", boolean.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the one bit that makes me wonder if it's necessary and what the right trade-off is. We had an atomic boolean before. Perhaps it deserves a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I created this PR was to basically:

  1. shave some overhead
  2. benchmark
  3. shave some more
  4. benchmark more
  5. ...
  6. Like benchmark results and open PR

I'm not sure this one in isolation will have a lot of effect and it's hard to justify in isolation. What it does is save some bytes of heap and more importantly CPU cache. If you do enough of these changes in aggregate you get a couple % improvement in the benchmark. But each individual one obviously is trivial :)

That said, the JDK itself uses this pattern in e.g. the FutureTask and to me the tradeoff seems worth it. It simply makes it more predictable what the code will compile to and at least we some guarantee that the byte for the boolean will be located right near the other mutable state fields in the object which is a good thing when they are so closely related logically and saves some cache thrashing :)
For the basic building blocks of our concurrency logic like this one I'd always err on the side of performance. Effectively it's 5 extra lines of code traded for the guarantee that the flag is flattened into the task instance and won't be moved away from it by the GC even if the task waits in the queue for 5s ever :) Seems like a good deal to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to put this in a separate PR though, but maybe then it'd be worthwhile to make this kind of adjustment in multiple places at once? (I haven't checked though, maybe there aren't that many actually)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your flow makes sense from a benchmarking perspective but there's also the code style perspective, what makes sense in a single PR, what parts of the code it touches. I find that this particular change could be highlighted and get more attention. That also helps because the rest of changes in this PR are low hanging fruit mechanical changes that are easily reviewed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing backed this one out for now :)

Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please remember to backport to 10x

@javanna javanna added this to the 10.1.0 milestone Oct 8, 2024
@original-brownbear
Copy link
Member Author

Thanks Luca! And sure thing will backport :)!

@original-brownbear original-brownbear merged commit 7757d43 into apache:main Oct 8, 2024
3 checks passed
@original-brownbear original-brownbear deleted the cheaper-taskexecutor branch October 8, 2024 09:04
original-brownbear added a commit that referenced this pull request Oct 8, 2024
The `TaskGroup` class is redundant, the futures list can be a local variable
shared by the tasks (this also removes any need for making it read-only).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants