Fastest possible timeout for rd_kafka_flush() to cleanly shut down a Producer #4021
Unanswered
Quuxplusone
asked this question in
Q&A
Replies: 1 comment
-
I ended up doing this:
Now I'm moving on to figure out the same sequence for a consumer... |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Under "Proper termination sequence for Producers", the documentation says:
My question is, where does that
60*1000
one-minute timeout come from? What goes wrong if I lower that timeout to "one second" instead? Is there a concrete cutoff where the behavior suddenly goes wrong?Worse, is the behavior dependent on any config options? For example, suppose I have set my
message.timeout.ms
to 700ms; can I get away with a lowerrd_kafka_flush
timeout then? Or suppose I have set mymessage.timeout.ms
to300*1000
ms; must I increase therd_kafka_flush
timeout correspondingly?What I would really like is a 100% foolproof way to shut down the producer, instantly timing out any undelivered messages. There is a recommended approach in the documentation here:
but I would prefer a synchronous way, such as just saying
rd_kafka_flush(rk, 0);
. My understanding from the current documentation is that if I just saythen I will definitely see hangs and/or undefined behavior. Is my understanding accurate?
Update 1
I guess I also don't understand the use of
rd_kafka_flush
there. According to the header file,rd_kafka_flush(rk, timeout)
might return eitherNO_ERROR
orTIMED_OUT
, and if it returnsTIMED_OUT
then it seems like there would still be things in outq that haven't been flushed yet, right? So I don't understand (A) how the example code gets away with not checking the return value ofrd_kafka_flush
; (B) howrd_kafka_flush
is different fromrd_kafka_poll
; nor (C) whether it actually suffices to callrd_kafka_flush
as shown in the example, or if I actually need to call it in a loop untilrd_kafka_outq_len
reaches zero (which I don't want to do because that might take up tomessage.timeout.ms
milliseconds, which is 5 minutes).Update 2
Looking at the source code, I see this in
rd_kafka_flush()
:This strongly indicates to me that (1) no matter what
N
you put intord_kafka_flush(rk, N)
, unless all your brokers happen to wake up withinN
ms, you will still leak messages such that it is still unsafe to callrd_kafka_destroy(rk)
(i.e., technically speaking,rd_kafka_flush
is non-blocking and thus "unsafe at any speed"?); and (2) a correct shutdown procedure always involves callingrd_kafka_flush
in a loop like this:Does that sound right? I mean, I hope I'm wrong, because I really do want to find a 100% foolproof way of shutting down a Producer within a bounded amount of time — preferably a small amount of time, like less than 1000ms — and it's looking more and more like the answer is "you can't, it always takes at least message.timeout.ms and potentially unboundedly longer."
Beta Was this translation helpful? Give feedback.
All reactions