You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the simple and normalizedDate segment name generators include min/max value timestamps in the segment name to make them unique if the table is configured with a time column.
Example segment name: testTable_2023-09-10_2023-09-20_12 (<table name>_<min time value>_<max time value>_<sequence ID>)
We ran into an issue on an append table where we re-ran the segment creation job on a particular day since the upstream data changed slightly. The min/max time value also changed causing new segments to be pushed instead of overwriting the old ones. This created data inconsistency. We're currently using the normalizedDate name generator w/ global sequence ID enabled.
Currently the
simple
andnormalizedDate
segment name generators include min/max value timestamps in the segment name to make them unique if the table is configured with a time column.Example segment name:
testTable_2023-09-10_2023-09-20_12 (<table name>_<min time value>_<max time value>_<sequence ID>)
We ran into an issue on an append table where we re-ran the segment creation job on a particular day since the upstream data changed slightly. The min/max time value also changed causing new segments to be pushed instead of overwriting the old ones. This created data inconsistency. We're currently using the
normalizedDate
name generator w/ global sequence ID enabled.https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java#L264-L284
cc @Jackie-Jiang
The text was updated successfully, but these errors were encountered: