Skip to content

Commit

Permalink
Add estimateSerializedSize to BatchVectorSerializer (facebookincubato…
Browse files Browse the repository at this point in the history
…r#8712)

Summary:
Pull Request resolved: facebookincubator#8712

This change adds an estimateSerializedSize function to the BatchVectorSerializer interface.

In the DefaultBatchVectorSerializer this simple defers to the existing estimateSerializedSize in VectorSerde.  However, in
PrestoBatchVectorSerializer since we preserve encodings, its implementation needs to take this into account.

PrestoBatchVectorSerializer's shares a lot of code with PrestoVectorSerde's estimateSerializedSize, but adds the following:
* estimateConstantSerializedSize: this sets the size to the size of the single constant value, regardless of the width of the
range.
* estimateDictionarySerializedSize: this sets the size to the aggregate of the indices plus the size of the selected entries in
the dictionary.
* estimateSerializedSizeImpl: like estimateSerializedSizeInt in PrestoVectorSerde, this drives the estimation, but now calling
our new functions

I found a few bugs in the existing estimation logic which i fixed.

Note that we inherit a number of inaccuracies from PrestoVectorSerde's estimateSerializedSize, but these are mostly
constants if you ignore that they become multiplied when they occur in the elements of a complex type (it really is just an
estimate).

It also doesn't account for the fact the Serializer may disable dictionary encoding if it doesn't provide value.  I plan to add this
in a follow up to keep this change from becoming more complicated.

Reviewed By: bikramSingh91

Differential Revision: D53593847

fbshipit-source-id: 639d01437797e07c378bc837e0a0283c7d08343d
  • Loading branch information
Kevin Wilfong authored and facebook-github-bot committed Mar 26, 2024
1 parent c354c31 commit 7fc0966
Show file tree
Hide file tree
Showing 5 changed files with 638 additions and 30 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ TEST_F(SumDataSizeForStatsTest, complexRecursiveGlobalAggregate) {
}),
})};

testAggregations(vectors, {}, {"sum_data_size_for_stats(c0)"}, "SELECT 118");
testAggregations(vectors, {}, {"sum_data_size_for_stats(c0)"}, "SELECT 115");
}

TEST_F(SumDataSizeForStatsTest, constantEncodingTest) {
Expand Down Expand Up @@ -269,10 +269,10 @@ TEST_F(SumDataSizeForStatsTest, dictionaryEncodingTest) {
BaseVector::wrapInDictionary(nullptr, indices, size, columnTwo);
auto vectors = {makeRowVector({columnOne, columnTwoDictionaryEncoded})};

testAggregations(vectors, {}, {"sum_data_size_for_stats(c1)"}, "SELECT 118");
testAggregations(vectors, {}, {"sum_data_size_for_stats(c1)"}, "SELECT 115");

testAggregations(
vectors, {"c0"}, {"sum_data_size_for_stats(c1)"}, "VALUES (1,82),(2,36)");
vectors, {"c0"}, {"sum_data_size_for_stats(c1)"}, "VALUES (1,79),(2,36)");
}

TEST_F(SumDataSizeForStatsTest, mask) {
Expand Down
Loading

0 comments on commit 7fc0966

Please sign in to comment.