Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gapfill: Add support for lowercase datatypes #11722

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ArnavBalyan
Copy link
Contributor

@ArnavBalyan ArnavBalyan commented Oct 2, 2023

bugfix for Gapfill functions where lowercase datatypes with certain aggregate functions do not work (causing null ptr exceptions).

  • Pinot natively supports lower case for aggregate functions. However this functionality breaks when used with Gapfill.

  • This is because the dataschema returned by the server is always uppercase. There is a mismatch between the column names causing NPE (when attempting to map the column alias with the fetched data).

  • Added a check to handle lowercase data schemas when creating response of gapfill queries.

  • Fixes Gapfill function fails if datatype in lastwithtime is not uppercased #11322

Comment on lines +139 to +141
} else if (columnNameToAliasMap.containsKey(caseInsensitiveTypeString(dataSchema.getColumnNames()[i]))) {
dataSchema.getColumnNames()[i] =
columnNameToAliasMap.get(caseInsensitiveTypeString(dataSchema.getColumnNames()[i]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. With this code, I think we can remove the first if statement to simplify the code(even though it always runs the caseInsensitiveTypeString code and may be a bit slower), what do you think?
if (columnNameToAliasMap.containsKey(dataSchema.getColumnNames()[i])) {
    dataSchema.getColumnNames()[i] = columnNameToAliasMap.get(dataSchema.getColumnNames()[i]);
}
  1. We can define a variable lowerCaseColumnName = caseInsensitiveTypeString(columnName) to avoid execute the same code twice.

protected String caseInsensitiveTypeString(String columnName) {
String dataTypePattern = "(BOOLEAN|INT|LONG|FLOAT|DOUBLE|STRING)";
Matcher matcher = Pattern.compile(dataTypePattern, Pattern.CASE_INSENSITIVE).matcher(columnName);
StringBuffer result = new StringBuffer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make the code thread-safe? If not, StringBuilder will be faster

@@ -226,4 +231,16 @@ protected List<Object[]> gapFillAndAggregate(List<Object[]> rows, DataSchema dat
DataSchema resultTableSchema) {
throw new UnsupportedOperationException("Not supported");
}

protected String caseInsensitiveTypeString(String columnName) {
Copy link
Contributor

@zhtaoxiang zhtaoxiang Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I haven't check the details of the logic, so I may be wrong):
the logic will also replace INT_column to int_column (similar for other types) if the parameter contains such string, is this intended?

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create the columnNameToAliasMap? The general solution should be simply replace the name in data schema to the alias if alias exists.

cc @weixiangsun

@codecov-commenter
Copy link

Codecov Report

Merging #11722 (31ef6b7) into master (ae16812) will decrease coverage by 0.02%.
Report is 44 commits behind head on master.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master   #11722      +/-   ##
============================================
- Coverage     63.11%   63.09%   -0.02%     
  Complexity     1117     1117              
============================================
  Files          2342     2342              
  Lines        125802   125814      +12     
  Branches      19336    19338       +2     
============================================
- Hits          79395    79382      -13     
- Misses        40745    40779      +34     
+ Partials       5662     5653       -9     
Flag Coverage Δ
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 63.03% <100.00%> (-0.03%) ⬇️
java-17 62.95% <100.00%> (+0.02%) ⬆️
java-20 62.93% <100.00%> (-0.02%) ⬇️
temurin 63.09% <100.00%> (-0.02%) ⬇️
unittests 63.08% <100.00%> (-0.02%) ⬇️
unittests1 67.23% <100.00%> (+0.01%) ⬆️
unittests2 14.45% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../pinot/core/query/reduce/BaseGapfillProcessor.java 96.29% <100.00%> (+0.46%) ⬆️

... and 12 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gapfill function fails if datatype in lastwithtime is not uppercased
4 participants