Optimize slice calculation in IndexSearcher a little #13860

original-brownbear · 2024-10-04T21:49:06Z

Fewer volatile reads, less indirection and a fast-path for when there's no executor. Also, saving some copies, sorting array instead of list, and saving allocations all around. This PR is obviously not a big win but in aggregate it's quite measurable and mostly deals with tiny regressions introduced recently. So opening this as a suggestion for dealing with that boiling frog :)

Luceneutil over 40 rounds does show small but significant improvements:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      AndHighLow     1495.00      (5.8%)     1481.40      (5.5%)   -0.9% ( -11% -   11%) 0.471
                    OrNotHighLow     1303.33      (5.9%)     1303.23      (7.1%)   -0.0% ( -12% -   13%) 0.996
           BrowseMonthTaxoFacets       12.49     (25.8%)       12.53     (30.2%)    0.3% ( -44% -   76%) 0.959
                      AndHighMed      331.01      (6.7%)      333.83      (5.6%)    0.9% ( -10% -   14%) 0.537
               HighTermTitleSort       61.03      (3.6%)       61.57      (3.5%)    0.9% (  -5% -    8%) 0.261
                         LowTerm      670.36      (3.9%)      677.32      (4.2%)    1.0% (  -6% -    9%) 0.253
                          Fuzzy1       90.76      (3.3%)       91.82      (3.9%)    1.2% (  -5% -    8%) 0.149
           BrowseMonthSSDVFacets        4.57     (12.5%)        4.63      (7.2%)    1.3% ( -16% -   24%) 0.563
                         MedTerm      481.17      (8.8%)      487.77      (6.1%)    1.4% ( -12% -   17%) 0.416
                       OrHighLow      779.19      (4.3%)      790.11      (4.0%)    1.4% (  -6% -   10%) 0.132
                         Prefix3      410.59      (6.4%)      417.03      (7.0%)    1.6% ( -11% -   15%) 0.296
                       OrHighMed      211.85      (5.2%)      215.23      (6.0%)    1.6% (  -9% -   13%) 0.201
                    OrNotHighMed      376.36      (4.2%)      382.45      (4.3%)    1.6% (  -6% -   10%) 0.089
                   OrHighNotHigh      266.72      (5.5%)      271.13      (5.1%)    1.7% (  -8% -   12%) 0.161
                    HighSpanNear       18.28      (7.7%)       18.59      (8.0%)    1.7% ( -13% -   18%) 0.332
         AndHighMedDayTaxoFacets       47.77      (4.3%)       48.60      (3.3%)    1.7% (  -5% -    9%) 0.042
                          Fuzzy2       72.70      (2.8%)       74.05      (2.9%)    1.9% (  -3% -    7%) 0.004
            HighTermTitleBDVSort       22.00      (7.1%)       22.41      (6.7%)    1.9% ( -11% -   16%) 0.228
                        PKLookup      240.31      (2.0%)      244.88      (2.2%)    1.9% (  -2% -    6%) 0.000
                     AndHighHigh       99.80      (4.0%)      101.75      (2.7%)    2.0% (  -4% -    8%) 0.010
                          IntNRQ       94.21      (5.0%)       96.05      (4.8%)    2.0% (  -7% -   12%) 0.075
                 MedSloppyPhrase       61.87      (3.5%)       63.11      (4.0%)    2.0% (  -5% -    9%) 0.017
                    OrHighNotLow      522.07      (8.3%)      533.24      (8.1%)    2.1% ( -13% -   20%) 0.243
                       LowPhrase       51.26      (3.7%)       52.36      (3.7%)    2.1% (  -5% -    9%) 0.009
                      OrHighHigh       53.23      (8.3%)       54.52      (8.1%)    2.4% ( -12% -   20%) 0.189
            MedTermDayTaxoFacets       13.21      (4.6%)       13.54      (4.9%)    2.5% (  -6% -   12%) 0.020
             MedIntervalsOrdered       48.53      (5.4%)       49.82      (6.3%)    2.7% (  -8% -   15%) 0.042
                HighSloppyPhrase       23.58      (6.1%)       24.22      (7.6%)    2.7% ( -10% -   17%) 0.081
                    OrHighNotMed      422.74      (4.9%)      434.52      (5.2%)    2.8% (  -6% -   13%) 0.014
     BrowseRandomLabelTaxoFacets        4.46      (9.2%)        4.59      (9.0%)    2.8% ( -14% -   23%) 0.168
           HighTermDayOfYearSort      403.10      (7.0%)      414.37      (6.8%)    2.8% ( -10% -   17%) 0.071
       BrowseDayOfYearSSDVFacets        4.55      (8.2%)        4.68      (7.2%)    2.8% ( -11% -   19%) 0.104
                       MedPhrase      382.95      (9.3%)      393.79      (9.2%)    2.8% ( -14% -   23%) 0.172
                         Respell       49.12      (1.2%)       50.54      (2.2%)    2.9% (   0% -    6%) 0.000
               HighTermMonthSort     1514.02      (6.1%)     1557.67      (6.3%)    2.9% (  -8% -   16%) 0.038
                     MedSpanNear       41.96      (5.2%)       43.21      (5.6%)    3.0% (  -7% -   14%) 0.013
                        HighTerm      591.81      (7.2%)      609.51      (6.6%)    3.0% ( -10% -   18%) 0.053
                      HighPhrase       82.89      (6.9%)       85.37      (6.0%)    3.0% (  -9% -   16%) 0.038
                        Wildcard      160.07      (3.4%)      165.02      (3.5%)    3.1% (  -3% -   10%) 0.000
                   OrNotHighHigh      500.91      (7.0%)      516.89      (7.2%)    3.2% ( -10% -   18%) 0.045
     BrowseRandomLabelSSDVFacets        3.31      (3.6%)        3.41      (5.6%)    3.3% (  -5% -   12%) 0.002
             LowIntervalsOrdered        8.36      (8.6%)        8.65      (7.6%)    3.5% ( -11% -   21%) 0.056
                     LowSpanNear       37.83      (2.5%)       39.17      (2.6%)    3.5% (  -1% -    8%) 0.000
        AndHighHighDayTaxoFacets       13.87      (4.8%)       14.39      (6.0%)    3.7% (  -6% -   15%) 0.002
                 LowSloppyPhrase       19.46      (3.5%)       20.20      (5.3%)    3.8% (  -4% -   13%) 0.000
                      TermDTSort      173.31      (6.5%)      180.12      (6.5%)    3.9% (  -8% -   18%) 0.007
            HighIntervalsOrdered       19.36      (6.3%)       20.13      (6.0%)    4.0% (  -7% -   17%) 0.004
       BrowseDayOfYearTaxoFacets        5.46     (11.7%)        5.70     (10.9%)    4.4% ( -16% -   30%) 0.080
            BrowseDateSSDVFacets        1.23      (7.5%)        1.29      (7.1%)    4.7% (  -9% -   20%) 0.004
            BrowseDateTaxoFacets        5.36     (11.4%)        5.62     (11.1%)    4.7% ( -16% -   30%) 0.061
          OrHighMedDayTaxoFacets        4.77      (8.8%)        5.02      (8.9%)    5.2% ( -11% -   25%) 0.008

Fewer volatile reads, less indirection and a fast-path for when there's no executor. Also, saving some copies, sorting array instead of list and saving allocations.

javanna · 2024-10-07T08:37:45Z

lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java

-                      .map(LeafReaderContextPartition::createForEntireSegment)
-                      .toList()));
+    for (List<LeafReaderContextPartition> currentGroup : groupedLeafPartitions) {
+      slices[upto] = new LeafSlice(currentGroup);


I am fine with extracting the partitioning bit to a separate method. Ideally that is made as a single mechanical change though. The diff is hard to diff otherwise.

javanna · 2024-10-07T08:41:45Z

lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java

+    return res;
+  }
+
+  private synchronized LeafSlice[] computeAndCacheSlices() {


This looks ok to me, and potentially simpler than the supplier. Nitpicking, I would prefer that made alone in its own PR. This does not affect when the below error gets thrown compared to before right? It is still thrown the first time the slices are retrieved.

Right behavior is completely unchanged. It's still the same amount guarantees around ordering as before :)

cool can you open a separate PR for it? Easier to review then.

Sure thing, finally got around to it in #13893 :)

javanna · 2024-10-07T08:44:56Z

lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java

-        final C collector = collectorManager.newCollector();
-        collectors.add(collector);
+      final Weight weight = createWeight(rewrite(query, scoreMode.needsScores()), scoreMode, 1);
+      if (leafSlices.length == 1) {


I don't know how I feel about specializing the single slice codepath and having a searchMultipleSlices method. I would prefer to avoid that I think and to share the same codepath between these two scenarios.

I think I'm a big fan of having separate paths here. Single slice vs. multiple slice execution simply has an extreme performance impact depending on the query.
With wikimedium in Luceneutil, this is running 8 queries in parallel across 8 threads, slicing into up to 8 slices vs always slicing into a single slice:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighIntervalsOrdered 2.64 (5.0%) 0.64 (1.3%) -75.7% ( -78% - -73%) 0.000 HighSpanNear 5.97 (4.4%) 2.01 (0.9%) -66.4% ( -68% - -63%) 0.000 MedSloppyPhrase 7.31 (5.7%) 2.76 (1.6%) -62.2% ( -65% - -58%) 0.000 HighTermTitleBDVSort 12.58 (6.3%) 4.90 (2.4%) -61.1% ( -65% - -55%) 0.000 MedSpanNear 14.89 (6.4%) 6.93 (1.2%) -53.4% ( -57% - -48%) 0.000 LowIntervalsOrdered 20.75 (11.2%) 10.13 (1.8%) -51.2% ( -57% - -43%) 0.000 IntNRQ 39.41 (15.3%) 23.34 (8.5%) -40.8% ( -56% - -19%) 0.000 MedIntervalsOrdered 32.83 (8.3%) 19.60 (3.9%) -40.3% ( -48% - -30%) 0.000 HighSloppyPhrase 29.22 (7.4%) 18.36 (2.9%) -37.2% ( -44% - -28%) 0.000 OrHighMedDayTaxoFacets 8.60 (5.6%) 5.63 (3.8%) -34.5% ( -41% - -26%) 0.000 AndHighHighDayTaxoFacets 15.40 (8.5%) 10.55 (2.7%) -31.5% ( -39% - -22%) 0.000 AndHighMedDayTaxoFacets 27.94 (7.9%) 22.57 (2.4%) -19.2% ( -27% - -9%) 0.000 LowPhrase 45.52 (12.6%) 38.08 (1.8%) -16.4% ( -27% - -2%) 0.000 AndHighHigh 36.50 (14.6%) 30.68 (5.2%) -15.9% ( -31% - 4%) 0.000 MedTermDayTaxoFacets 15.19 (8.4%) 13.38 (5.1%) -11.9% ( -23% - 1%) 0.000 OrHighHigh 48.10 (14.5%) 43.38 (8.4%) -9.8% ( -28% - 15%) 0.019 BrowseDateSSDVFacets 1.20 (6.7%) 1.11 (12.1%) -7.3% ( -24% - 12%) 0.035 BrowseMonthTaxoFacets 11.02 (38.3%) 10.28 (40.7%) -6.7% ( -61% - 117%) 0.634 BrowseDayOfYearTaxoFacets 5.24 (10.1%) 5.03 (7.0%) -3.9% ( -19% - 14%) 0.200 BrowseDateTaxoFacets 5.17 (10.2%) 4.96 (7.0%) -3.9% ( -19% - 14%) 0.207 LowSpanNear 62.43 (5.1%) 60.97 (3.1%) -2.3% ( -10% - 6%) 0.117 Wildcard 78.35 (5.8%) 76.52 (2.3%) -2.3% ( -9% - 6%) 0.137 MedPhrase 42.79 (8.5%) 42.43 (2.3%) -0.8% ( -10% - 10%) 0.702 BrowseRandomLabelTaxoFacets 4.27 (4.3%) 4.28 (5.3%) 0.4% ( -8% - 10%) 0.824 PKLookup 242.91 (1.7%) 245.39 (2.7%) 1.0% ( -3% - 5%) 0.203 Respell 45.76 (0.9%) 46.34 (2.1%) 1.3% ( -1% - 4%) 0.025 Prefix3 304.37 (1.8%) 308.85 (3.6%) 1.5% ( -3% - 7%) 0.145 HighPhrase 32.28 (9.5%) 32.90 (2.5%) 1.9% ( -9% - 15%) 0.434 BrowseDayOfYearSSDVFacets 4.37 (5.2%) 4.48 (10.1%) 2.5% ( -12% - 18%) 0.371 BrowseMonthSSDVFacets 4.40 (9.0%) 4.51 (18.2%) 2.6% ( -22% - 32%) 0.612 Fuzzy1 92.79 (0.5%) 95.29 (2.3%) 2.7% ( 0% - 5%) 0.000 Fuzzy2 88.78 (0.7%) 91.62 (2.2%) 3.2% ( 0% - 6%) 0.000 LowSloppyPhrase 78.35 (3.8%) 80.93 (3.7%) 3.3% ( -4% - 11%) 0.013 AndHighMed 55.83 (6.4%) 58.27 (5.7%) 4.4% ( -7% - 17%) 0.041 BrowseRandomLabelSSDVFacets 3.23 (4.8%) 3.39 (7.4%) 4.8% ( -7% - 17%) 0.028 OrHighMed 101.82 (5.0%) 108.12 (6.6%) 6.2% ( -5% - 18%) 0.003 AndHighLow 604.08 (3.4%) 654.10 (4.8%) 8.3% ( 0% - 17%) 0.000 OrHighLow 461.81 (4.5%) 516.22 (4.8%) 11.8% ( 2% - 22%) 0.000 HighTerm 361.41 (6.6%) 449.94 (12.5%) 24.5% ( 5% - 46%) 0.000 OrNotHighLow 435.40 (3.6%) 557.47 (7.0%) 28.0% ( 16% - 40%) 0.000 OrHighNotMed 275.75 (6.1%) 376.17 (12.5%) 36.4% ( 16% - 58%) 0.000 OrNotHighHigh 183.01 (5.4%) 250.78 (12.3%) 37.0% ( 18% - 57%) 0.000 OrNotHighMed 273.62 (4.1%) 377.47 (9.9%) 38.0% ( 23% - 54%) 0.000 OrHighNotLow 290.01 (6.9%) 422.33 (14.0%) 45.6% ( 23% - 71%) 0.000 LowTerm 610.48 (3.9%) 906.71 (9.5%) 48.5% ( 33% - 64%) 0.000 OrHighNotHigh 293.17 (5.7%) 438.61 (13.0%) 49.6% ( 29% - 72%) 0.000 MedTerm 309.52 (6.4%) 544.79 (16.0%) 76.0% ( 50% - 105%) 0.000 TermDTSort 91.38 (3.3%) 201.06 (16.2%) 120.0% ( 97% - 144%) 0.000 HighTermMonthSort 982.63 (2.5%) 2855.09 (9.7%) 190.6% ( 174% - 207%) 0.000 HighTermDayOfYearSort 153.03 (4.5%) 609.28 (18.0%) 298.1% ( 263% - 335%) 0.000 HighTermTitleSort 36.27 (10.4%) 200.12 (18.3%) 451.8% ( 383% - 536%) 0.000

Depending on what kind of query you run, parallelism will either get you a huge boost from better resource utilisation or a huge slowdown from contention and/or redundant work.
I think the two should go through different code paths a. allow for more optimisations down the line and b. make profiling easier (at leat for me personally there is lots of value in knowing that something did or didn't get forked or sliced :)).

Optimize slice calculation in IndexSearcher a little

0a9cbae

Fewer volatile reads, less indirection and a fast-path for when there's no executor. Also, saving some copies, sorting array instead of list and saving allocations.

original-brownbear requested a review from javanna October 4, 2024 21:49

original-brownbear mentioned this pull request Oct 5, 2024

Reduce TaskExecutor overhead #13861

Merged

javanna reviewed Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize slice calculation in IndexSearcher a little #13860

Optimize slice calculation in IndexSearcher a little #13860

original-brownbear commented Oct 4, 2024

javanna Oct 7, 2024

javanna Oct 7, 2024

original-brownbear Oct 7, 2024

javanna Oct 7, 2024

original-brownbear Oct 11, 2024

javanna Oct 7, 2024

original-brownbear Oct 7, 2024

Optimize slice calculation in IndexSearcher a little #13860

Are you sure you want to change the base?

Optimize slice calculation in IndexSearcher a little #13860

Conversation

original-brownbear commented Oct 4, 2024

javanna Oct 7, 2024

Choose a reason for hiding this comment

javanna Oct 7, 2024

Choose a reason for hiding this comment

original-brownbear Oct 7, 2024

Choose a reason for hiding this comment

javanna Oct 7, 2024

Choose a reason for hiding this comment

original-brownbear Oct 11, 2024

Choose a reason for hiding this comment

javanna Oct 7, 2024

Choose a reason for hiding this comment

original-brownbear Oct 7, 2024

Choose a reason for hiding this comment