-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix merging null multi value in partial upsert #13031
Conversation
if (_fieldSpecMap.get(column).isSingleValueField()) { | ||
row.putDefaultNullValue(column, _fieldSpecMap.get(column).getDefaultNullValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put null in case of singleValueField as well to keep it consistent with normal full-upsert/partial-upsert merger logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial implementation was setting null unconditionally: #11983 (comment)
I think it should not matter if we use null value or defaultNullValue but multivalues need to specifically use null
to not cause typecasting issues during indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MV column, you'll want to put new Object[]{defaultNullValue}
. Take a look at the logic in NullValueTransformer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jackie-Jiang for pointing this. I was concerned about creating new Object[] every time, this approach is cleaner. Will update the PR accordingly.
Update PartialUpsertHandler.java
559438c
to
2b1a750
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #13031 +/- ##
============================================
+ Coverage 61.75% 62.16% +0.41%
+ Complexity 207 198 -9
============================================
Files 2436 2502 +66
Lines 133233 136657 +3424
Branches 20636 21181 +545
============================================
+ Hits 82274 84959 +2685
- Misses 44911 45405 +494
- Partials 6048 6293 +245
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
} else { | ||
// multivalue must necessarily use null or MutableSegmentImpl.updateDictionary fails due to typecasting | ||
// primitive to array | ||
row.putDefaultNullValue(column, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is a backward incompatible change, but it might be ok to change it earlier since the impact of this change is not large, since this feature is relatively new.
the problem is, the default value of pinot multi-value column, is the default value of singleValue data type, which caused the type casting exception. Generally there are two solutions
- update the type casting code
- Update the current default null value for multi-value column, as shown in this diff.
@rohityadav1993 can you provide some insights on why #1 is not a feasible solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typecasting is done assuming Object[] type in the code attached in description(stacktrace). null can be typecasted but becuase defaultNullValue are stored as primitive for multi value fields it fails when we persist defualtNullValue when the merger result is null for multi value field.
Updating defaultNullValue behaviour for multi value will be major change and introducing change typecasting will have a larger impact radius.
The fix should be safe as it still populates null vector and slightly urgent as anyone merging null value for partial upsert would be seeing error records and may not be aware if not observing the metrics.
bugfix
Follow up from: #11983
When persisting merged result of previous and new row in PartialUpsertHandler, if a null multivalue field is populated using _defaultNullValue value it is persisted as a primitive and fails due to typecasting when updating dictionary and adding new record. Null multivalue fields should be persisted as null.
Observed errors: