-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gapfill: Add support for lowercase datatypes #11722
base: master
Are you sure you want to change the base?
Gapfill: Add support for lowercase datatypes #11722
Conversation
} else if (columnNameToAliasMap.containsKey(caseInsensitiveTypeString(dataSchema.getColumnNames()[i]))) { | ||
dataSchema.getColumnNames()[i] = | ||
columnNameToAliasMap.get(caseInsensitiveTypeString(dataSchema.getColumnNames()[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- With this code, I think we can remove the first if statement to simplify the code(even though it always runs the
caseInsensitiveTypeString
code and may be a bit slower), what do you think?
if (columnNameToAliasMap.containsKey(dataSchema.getColumnNames()[i])) {
dataSchema.getColumnNames()[i] = columnNameToAliasMap.get(dataSchema.getColumnNames()[i]);
}
- We can define a variable
lowerCaseColumnName = caseInsensitiveTypeString(columnName)
to avoid execute the same code twice.
protected String caseInsensitiveTypeString(String columnName) { | ||
String dataTypePattern = "(BOOLEAN|INT|LONG|FLOAT|DOUBLE|STRING)"; | ||
Matcher matcher = Pattern.compile(dataTypePattern, Pattern.CASE_INSENSITIVE).matcher(columnName); | ||
StringBuffer result = new StringBuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make the code thread-safe? If not, StringBuilder
will be faster
@@ -226,4 +231,16 @@ protected List<Object[]> gapFillAndAggregate(List<Object[]> rows, DataSchema dat | |||
DataSchema resultTableSchema) { | |||
throw new UnsupportedOperationException("Not supported"); | |||
} | |||
|
|||
protected String caseInsensitiveTypeString(String columnName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I haven't check the details of the logic, so I may be wrong):
the logic will also replace INT_column
to int_column
(similar for other types) if the parameter contains such string, is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create the columnNameToAliasMap
? The general solution should be simply replace the name in data schema to the alias if alias exists.
cc @weixiangsun
Codecov Report
@@ Coverage Diff @@
## master #11722 +/- ##
============================================
- Coverage 63.11% 63.09% -0.02%
Complexity 1117 1117
============================================
Files 2342 2342
Lines 125802 125814 +12
Branches 19336 19338 +2
============================================
- Hits 79395 79382 -13
- Misses 40745 40779 +34
+ Partials 5662 5653 -9
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 12 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
bugfix for Gapfill functions where lowercase datatypes with certain aggregate functions do not work (causing null ptr exceptions).
Pinot natively supports lower case for aggregate functions. However this functionality breaks when used with Gapfill.
This is because the dataschema returned by the server is always uppercase. There is a mismatch between the column names causing NPE (when attempting to map the column alias with the fetched data).
Added a check to handle lowercase data schemas when creating response of gapfill queries.
Fixes Gapfill function fails if datatype in lastwithtime is not uppercased #11322