Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support omitting timestamps from segment name generator #11649

Closed
dang-stripe opened this issue Sep 21, 2023 · 1 comment
Closed

support omitting timestamps from segment name generator #11649

dang-stripe opened this issue Sep 21, 2023 · 1 comment

Comments

@dang-stripe
Copy link
Contributor

dang-stripe commented Sep 21, 2023

Currently the simple and normalizedDate segment name generators include min/max value timestamps in the segment name to make them unique if the table is configured with a time column.

Example segment name: testTable_2023-09-10_2023-09-20_12 (<table name>_<min time value>_<max time value>_<sequence ID>)

We ran into an issue on an append table where we re-ran the segment creation job on a particular day since the upstream data changed slightly. The min/max time value also changed causing new segments to be pushed instead of overwriting the old ones. This created data inconsistency. We're currently using the normalizedDate name generator w/ global sequence ID enabled.

https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java#L264-L284

cc @Jackie-Jiang

@dang-stripe
Copy link
Contributor Author

Fixed by #11650

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant