-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR WorkerSinkTask Commit of offsets threw an unexpected exception for sequence number... #1026
Comments
Hi @jmks, thanks for using our connector. And for the error you provided is the list of channel offsets the only visible data or is there something else that might have been trimmed out? |
Thanks for looking into this.
This happened much later and sounds like a rebalance did not go well. I'm not familiar with this error as we only recently started using the sticky assignments (to speed up rebalances with many topic-partitions).
There were a coupld extra lines filtered out, and I found some WARN that lead to this exception.
So it seems like this worker got some unexpected topic-partitions as it shows up in their metadata. When it rebalances, those topics are removed from its assignment:
Here are the logs mentioning this worker: https://gist.github.com/jmks/c228bd6df68f3383c8d32b5d59a13048 So this is what the logs are telling me:
Over the next few hours we had this happen a few times. It even happened to multiple tasks at the same time, then the logs get muddied with rebalances, slow startups, etc. I'm not sure what caused this worker to fetch offsets for a topic-partition it was not assigned. I could not find a log for The last thing to note is we get a lot of these (even today):
My understanding is that it fails to commit all the offsets it has, but still makes progress. The large number of topic-partitions probably causes commits to be slow, but we could increase the timeout to give it more time. |
Hi @jmks, thanks for more details.
Is it possible that there were some new topics-partitions created at that time? From what I know topic creation is not guaranteed to be an atomic operation, especially in multi node environments and most of the libraries for Kafka do such operations asynchronously. Since you couldn't find logs for this topic-partition before is it possible that it was a new one and connector might have hit a case where topic/partition creation was still being propagated throughout the environment?
Increasing timeout sounds reasonable especially if you have a lot of topics/partitions, however it will be difficult to say anything more on that matter without knowing the specifics of you setup and configuration. If the issue reoccurs more often I would recommend to reach out to Snowflake support directly. |
We're running
SnowflakeSinkConnector
version 2.5.0 using Snowflake streaming and had this error happen:We only had it happen on that day (Nov 22) and it's since been fine.
I've found a couple issues with this error message but they did not have a resolution.
Has anyone seen this or know any way to help prevent it?
The text was updated successfully, but these errors were encountered: