Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix merging null multi value in partial upsert #13031

Merged
merged 2 commits into from
May 1, 2024

Conversation

rohityadav1993
Copy link
Contributor

@rohityadav1993 rohityadav1993 commented Apr 30, 2024

bugfix
Follow up from: #11983
When persisting merged result of previous and new row in PartialUpsertHandler, if a null multivalue field is populated using _defaultNullValue value it is persisted as a primitive and fails due to typecasting when updating dictionary and adding new record. Null multivalue fields should be persisted as null.

Observed errors:

java.lang.ClassCastException: class java.lang.String cannot be cast to class [Ljava.lang.Object; (java.lang.String and [Ljava.lang.Object; are in module java.base of loader 'bootstrap')
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.updateDictionary(MutableSegmentImpl.java:590)
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.index(MutableSegmentImpl.java:492)
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:592)
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:443)
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:679)
at java.base/java.lang.Thread.run(Thread.java:829)
java.lang.ClassCastException: class java.lang.Double cannot be cast to class [Ljava.lang.Object; (java.lang.Double and [Ljava.lang.Object; are in module java.base of loader 'bootstrap')
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.addNewRow(MutableSegmentImpl.java:720)
at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.index(MutableSegmentImpl.java:493)
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:592)
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:443)
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:679)
at java.base/java.lang.Thread.run(Thread.java:829)

if (_fieldSpecMap.get(column).isSingleValueField()) {
row.putDefaultNullValue(column, _fieldSpecMap.get(column).getDefaultNullValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put null in case of singleValueField as well to keep it consistent with normal full-upsert/partial-upsert merger logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial implementation was setting null unconditionally: #11983 (comment)

I think it should not matter if we use null value or defaultNullValue but multivalues need to specifically use null to not cause typecasting issues during indexing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MV column, you'll want to put new Object[]{defaultNullValue}. Take a look at the logic in NullValueTransformer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jackie-Jiang for pointing this. I was concerned about creating new Object[] every time, this approach is cleaner. Will update the PR accordingly.

Update PartialUpsertHandler.java
@rohityadav1993 rohityadav1993 marked this pull request as ready for review April 30, 2024 10:12
@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.16%. Comparing base (59551e4) to head (b46ba2d).
Report is 389 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13031      +/-   ##
============================================
+ Coverage     61.75%   62.16%   +0.41%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2502      +66     
  Lines        133233   136657    +3424     
  Branches      20636    21181     +545     
============================================
+ Hits          82274    84959    +2685     
- Misses        44911    45405     +494     
- Partials       6048     6293     +245     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.11% <100.00%> (+0.40%) ⬆️
java-21 35.04% <0.00%> (-26.58%) ⬇️
skip-bytebuffers-false 62.15% <100.00%> (+0.40%) ⬆️
skip-bytebuffers-true 35.00% <0.00%> (+7.28%) ⬆️
temurin 62.16% <100.00%> (+0.41%) ⬆️
unittests 62.16% <100.00%> (+0.41%) ⬆️
unittests1 46.72% <0.00%> (-0.17%) ⬇️
unittests2 27.94% <100.00%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

} else {
// multivalue must necessarily use null or MutableSegmentImpl.updateDictionary fails due to typecasting
// primitive to array
row.putDefaultNullValue(column, null);
Copy link
Contributor

@deemoliu deemoliu Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is a backward incompatible change, but it might be ok to change it earlier since the impact of this change is not large, since this feature is relatively new.

the problem is, the default value of pinot multi-value column, is the default value of singleValue data type, which caused the type casting exception. Generally there are two solutions

  1. update the type casting code
  2. Update the current default null value for multi-value column, as shown in this diff.

@rohityadav1993 can you provide some insights on why #1 is not a feasible solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typecasting is done assuming Object[] type in the code attached in description(stacktrace). null can be typecasted but becuase defaultNullValue are stored as primitive for multi value fields it fails when we persist defualtNullValue when the merger result is null for multi value field.

Updating defaultNullValue behaviour for multi value will be major change and introducing change typecasting will have a larger impact radius.

The fix should be safe as it still populates null vector and slightly urgent as anyone merging null value for partial upsert would be seeing error records and may not be aware if not observing the metrics.

@Jackie-Jiang Jackie-Jiang merged commit 0f28a5c into apache:master May 1, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants