Avoid unnecessary rows conversion in aggregation #11607

Jackie-Jiang · 2023-09-17T08:45:39Z

Currently multi-stage aggregation/group-by requires both materialized rows and data block. This means we always need to do conversion whether getting the transferrable block locally (with materialized rows) or remotely (with data block).

This PR:

Enhanced RowBasedBlockValSet to support null
Added FilteredRowBasedBlockValSet to support filtered aggregation from materialized rows
Enhanced DataBlockExtractUtils (originally DataBlockUtils):
- Correctly handle null with filter
- Avoid per row extract to reduce the overhead
Enhanced AggregateOperator, MultistageAggregationExecutor, MultistageGroupByExecutor:
- Use only materialized rows or data block and avoid unnecessary row conversion
- Avoid per row column index lookup
- Avoid extracting filtered rows multiple times when it can be shared

codecov-commenter · 2023-09-17T09:26:08Z

Codecov Report

Merging #11607 (5f34fa6) into master (a8411c0) will decrease coverage by 0.14%.
The diff coverage is 31.23%.

@@             Coverage Diff              @@
##             master   #11607      +/-   ##
============================================
- Coverage     63.20%   63.06%   -0.14%     
- Complexity     1107     1145      +38     
============================================
  Files          2323     2325       +2     
  Lines        124465   124815     +350     
  Branches      18989    19136     +147     
============================================
+ Hits          78672    78719      +47     
- Misses        40204    40487     +283     
- Partials       5589     5609      +20

Flag	Coverage Δ
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`63.02% <31.23%> (-0.12%)`	⬇️
java-17	`62.92% <31.23%> (-0.15%)`	⬇️
java-20	`62.92% <31.23%> (-0.15%)`	⬇️
temurin	`63.06% <31.23%> (-0.14%)`	⬇️
unittests	`63.06% <31.23%> (-0.14%)`	⬇️
unittests1	`67.27% <31.23%> (-0.18%)`	⬇️
unittests2	`14.48% <0.00%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
.../apache/pinot/common/datablock/DataBlockUtils.java	`87.93% <ø> (+60.73%)`	⬆️
...untime/operator/block/FilteredDataBlockValSet.java	`0.00% <0.00%> (-34.79%)`	⬇️
...va/org/apache/pinot/spi/utils/CommonConstants.java	`20.43% <0.00%> (-0.69%)`	⬇️
.../runtime/operator/block/DataBlockExtractUtils.java	`14.28% <14.28%> (ø)`
...e/pinot/core/query/reduce/RowBasedBlockValSet.java	`20.50% <17.87%> (+4.78%)`	⬆️
...core/query/reduce/FilteredRowBasedBlockValSet.java	`22.30% <22.30%> (ø)`
.../query/runtime/operator/block/DataBlockValSet.java	`40.74% <58.33%> (-9.26%)`	⬇️
.../pinot/query/runtime/blocks/TransferableBlock.java	`75.43% <66.66%> (-0.93%)`	⬇️
...inot/query/runtime/operator/AggregateOperator.java	`83.15% <78.78%> (-11.25%)`	⬇️
...ry/runtime/operator/MultistageGroupByExecutor.java	`91.90% <92.13%> (-4.70%)`	⬇️
... and 3 more

... and 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

walterddr

lgtm mostly. have a couple of questions

walterddr · 2023-09-18T17:50:01Z

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

@@ -1065,4 +1066,14 @@ public enum JoinOverFlowMode {
      THROW, BREAK
    }
  }
+
+  public static class NullValuePlaceHolder {


we already have default null value in FieldSpecs do we plan to do differently here? should we consolidate?

We have that in ColumnDataType right now. Didn't change that to limit the scope of this PR

walterddr · 2023-09-18T17:50:43Z

...ery-runtime/src/test/java/org/apache/pinot/query/runtime/operator/AggregateOperatorTest.java

@@ -79,7 +79,7 @@ public void shouldHandleUpstreamErrorBlocks() {
    DataSchema outSchema = new DataSchema(new String[]{"group", "sum"}, new ColumnDataType[]{INT, DOUBLE});
    AggregateOperator operator =
        new AggregateOperator(OperatorTestUtil.getDefaultContext(), _input, outSchema, inSchema, calls, group,
-            AggType.INTERMEDIATE, null, null);
+            AggType.DIRECT, Collections.singletonList(-1), null);


any specific reason we change the AggType here?

INTERMEDIATE is not testing the whole flow. I want to test the aggregate as well as the merge

walterddr · 2023-09-18T18:01:56Z

...t-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/AggregateOperator.java

      aggFunctions[i] = AggregationFunctionFactory.getAggregationFunction(functionContexts.get(i), true);
    }

+    // Process the filter argument indices
+    int[] filterArgIds = new int[numFunctions];
+    int maxFilterArgId = -1;


why did we need this extra integer? isnt the filterArgIds array null/empty indicate theres no filter?

I got confused initially as well, then find it is never empty from RelNode. Added max filter arg id to help quickly identify whether there is filter, and to create the cache

walterddr · 2023-09-18T18:04:31Z

...t-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/AggregateOperator.java

-  //  value primitive type.
-  static Map<ExpressionContext, BlockValSet> getBlockValSetMap(AggregationFunction aggFunction, TransferableBlock block,
-      DataSchema inputDataSchema, Map<String, Integer> colNameToIndexMap, int filterArgIdx) {
+  private int[] getGroupKeyIds(List<RexExpression> groupSet) {


nit: why reordering functions? this is original getGroupSet right? (and if not cant we remove the convertRexExpressionToExpressionContext func)?

It is quite different though. One retrieves the expressions, one just for the ids. I'll try to remove convertRexExpressionToExpressionContext in a separate PR because that involves quite some change on the aggregation function

walterddr · 2023-09-18T18:05:37Z

...t-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/AggregateOperator.java

    Map<ExpressionContext, BlockValSet> blockValSetMap = new HashMap<>();
-    for (ExpressionContext expression : expressions) {
-      if (expression.getType().equals(ExpressionContext.Type.IDENTIFIER) && !"__PLACEHOLDER__".equals(


i was originally intended to get rid of the __PLACEHOLDER__. let's factor this into a util or leave a TODO so that it's easier to replace in the future (i see 4 __PLACEHOLDER__ usage in the new impl)

I want to do that in the following PR very soon

walterddr · 2023-09-18T18:07:52Z

...ery-runtime/src/main/java/org/apache/pinot/query/runtime/operator/block/DataBlockValSet.java

-  protected final DataBlock _dataBlock;
-  protected final int _index;
-  protected final RoaringBitmap _nullBitMap;
+  private final DataType _dataType;


i knew that DataBlock is a v2 concept. but other than that any specific reason we put these in the runtime module and the RowBasedBlockValSet in the core module? (i see DataBlock is still in pinot-common, imo might be eaiser to put everything together)

I would say there is no specific reason. Ideally we should move all BlockValSet together, but currently they are spread over 3 places. Can be done in a separate PR

walterddr

looked again on the details. good to go and follow up later

Jackie-Jiang added enhancement bugfix performance multi-stage Related to the multi-stage query engine labels Sep 17, 2023

Jackie-Jiang requested review from xiangfu0 and walterddr September 17, 2023 08:45

Jackie-Jiang force-pushed the avoid_ser_de_in_group_by branch from a41c1c4 to 99b6aeb Compare September 17, 2023 19:36

Avoid unnecessary rows conversion in aggregation

5f34fa6

Jackie-Jiang force-pushed the avoid_ser_de_in_group_by branch from 99b6aeb to 5f34fa6 Compare September 17, 2023 21:03

walterddr reviewed Sep 18, 2023

View reviewed changes

walterddr approved these changes Sep 18, 2023

View reviewed changes

Jackie-Jiang merged commit fef4e64 into apache:master Sep 18, 2023
21 checks passed

Jackie-Jiang deleted the avoid_ser_de_in_group_by branch September 18, 2023 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid unnecessary rows conversion in aggregation #11607

Avoid unnecessary rows conversion in aggregation #11607

Jackie-Jiang commented Sep 17, 2023 •

edited

Loading

codecov-commenter commented Sep 17, 2023 •

edited

Loading

walterddr left a comment

walterddr Sep 18, 2023

Jackie-Jiang Sep 18, 2023

walterddr Sep 18, 2023

Jackie-Jiang Sep 18, 2023

walterddr Sep 18, 2023

Jackie-Jiang Sep 18, 2023

walterddr Sep 18, 2023

Jackie-Jiang Sep 18, 2023

walterddr Sep 18, 2023 •

edited

Loading

Jackie-Jiang Sep 18, 2023

walterddr Sep 18, 2023 •

edited

Loading

Jackie-Jiang Sep 18, 2023

walterddr left a comment

Avoid unnecessary rows conversion in aggregation #11607

Avoid unnecessary rows conversion in aggregation #11607

Conversation

Jackie-Jiang commented Sep 17, 2023 • edited Loading

codecov-commenter commented Sep 17, 2023 • edited Loading

Codecov Report

walterddr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

walterddr Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

walterddr Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

walterddr left a comment

Choose a reason for hiding this comment

Jackie-Jiang commented Sep 17, 2023 •

edited

Loading

codecov-commenter commented Sep 17, 2023 •

edited

Loading

walterddr Sep 18, 2023 •

edited

Loading

walterddr Sep 18, 2023 •

edited

Loading