Replies: 1 comment 2 replies
-
Can anyone help me to confirm this commit: coord_req: re-query coordinator to avoid getting stuck fixes this issue? Here is my understanding, please correct me if I am wrong: During the disconnection, we see this log repeatly:
And based on the code:
And in function
And in the 1.8.2 librdkafka function
And because of this fix, it will trigger the monitor ops:
And inside function
And this is how this issue gets fixed. |
Beta Was this translation helpful? Give feedback.
-
Initially, we noticed a confluent-kafka python hang issue after a power outage of kafka brokers.
We reproduced this issue with the following steps:
Observations:
there are 12 threads running, and mainly can divided to doing two things, one part of threads are waiting for, with RD_POLL_INFINITE, and I don't see any code may signal the rkq_cond to wake up these threads. I think this is why it looks like a hang:
second part of the threads are polling events:
And from our process log, during the disconnection:
And after flush iptable rules, it looks like a hang, and no more logs.
GDB backtrace information:
Please help me to understand this problem, and point me which commit fixes this issue. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions