Replies: 2 comments 1 reply
-
Sounds like you're hitting what was fixed in this commit: See the 1.9.0 release notes: |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi, that sounds right, we do call store... asynchronously due to processing events in multiple threads that don't sync with consume thread, and ran into errors on 1.9.X Would it be acceptable to backport this, or at least the assign-time cleanup change, to 1.8.X or earlier line, to make the fix available w/older VC++ toolsets? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We were investigating an infrequent issue where a single partition offset is being reset to the past, usually to the beginning.
Most of our consumers use Confluent.Kafka/librdkafka 1.4.2 (which I know is old), disable auto.store.offset, call StoreOffset on a single partition after an event is done, and use auto-commit with default settings. Old STW rebalancing is used.
The symptoms look like librdkafka auto-committing a very stale offset across 2 separate assignments:
Process X is consuming events from partitions 1 and 2, then they get revoked and assigned to a different machine, where they are consumed fine for 5 days. 5 days later (in this case), there's some instability in the cluster and X is assigned 1 and 2 again for a short time (about 80 seconds). During these 80 seconds, we only observe events for partition 1 (and probably call StoreOffset for 1, but definitely don't call it for 2). Then, 1 and 2 get revoked from X again, and assigned to another machine. At this point partition 2 has an offset reset to the beginning.
Looking at offset topic, we see that during these 80 seconds, X has sent a valid update for partition 1, and an update for partition 2 with an offset that we were able to trace back to 5 days ago - given 24h topic retention this couldn't be any kind of broker/event offset, but it does match the previous assignment of partition 2 to X, 5 days ago.
Is that a normal behavior? Is there a known bug for this issue?
Beta Was this translation helpful? Give feedback.
All reactions