-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
send_multiget write call hangs if keys total bytesize > ~1,700,000 bytes #776
Comments
The issue seems to be that it hangs on this line when you have a large number of keys: Line 563 in 31dabf1
|
@justinschier @esilverberg Sorry for the late reply. I'm getting a similar result, albeit with somewhat different numbers. When I run a test, I see repeated success with 60k keys, and then failure for almost all cases above that. At 61k the failure happens ~50% of the time. When I adjust sndbuf and rcvbuf, I can get the number up substantially - I get success at 160k with a 2MB sndbuf and rcvbuf. Oddly, I'd expect only sndbuf to matter, but the improvement goes away if I don't bump rcvbuf. I'm going to look into this a bit, and see if I can figure out what's going on. |
So i think I have a handle on this. Basically it looks like memcached is behaving in an unexpected way, which is causing network issues because of how we're reading (or more importantly not reading) from our socket. Specifically, the multi get implementation does the following:
It works this way because the presumption is that no response is written by memcached (because the getkq's are 'quiet'), until the noop request is written. At that point the client expects that memcached will write all its responses, the client will be pulling response data from the socket, and essentially the buffers will clear out. This doesn't actually appear to be how it works once you get beyond a certain # of keys / response size. The binary protocol documentation says:
Note that this is a lot weaker than "You won't receive a response", and in fact it looks like the server bundles up IO once some memcached buffer size is exceeded, even if you're using getkq. This would explain the observed behavior. A comment on another issue discusses how you can run into this with a non-quiet get, and I suspect that (because the buffer size is being exceeded) our quiet gets wind up behaving just like the non-quiet gets. If this is the case, then there are basically two alternatives:
The latter is going to wind up being more efficient in multi-memcached instance environments, but involves substantially more complexity. It may be worth doing - not sure yet. Another downside of either approach is timeout handling. Essentially the multiget uses the socket timeout as its overall timeout (for large key sets this seems really non-ideal, but that's how the code works now). It would be difficult to preserve that timeout behavior with such batching. @justinschier @esilverberg I don't know where you wound up internally with this issue, but I think this is likely the cause. I'm going to do some verification, and either give some guidance on (1) in the documentation or look at implementing (2). |
@petergoldstein Thank you so much for this detailed analysis and explanation! We thought we were going crazy for a while until we spent some time setting up this particular repro. Our solution was indeed to batch up our key requests, and that seemed to work. |
@esilverberg Happy to help. Out of curiosity, what batch size are you using? |
We are batching every 500,000 keys in our application. |
I have a thought on how to address this issue and make the
This would leave the flow unchanged for smaller numbers of keys, but allow us to handle this larger size case more gracefully. Any thoughts? |
fixes petergoldstein#776. fixes petergoldstein#941. When reading a large number of keys, memcached starts sending the response when dalli is not yet finished sending the request. As we did not start reading the response until we were finished writing the request, this could lead to the following problem: * the receive buffer (rcvbuf) would fill up * due to TCP backpressure, memcached would stop sending (okay as we are not reading anyway), but also stop reading * the send buffer (sndbuf) would also fill up * as we were using a blocking write without timeout, we would block forever (at least with ruby < 3.2, which introduces IO::Timeout, see petergoldstein#967) This is addressed by using IO::select on the sockets for both read and write, and thus start reading as soon as data is available.
fixes petergoldstein#776. fixes petergoldstein#941. When reading a large number of keys, memcached starts sending the response when dalli is not yet finished sending the request. As we did not start reading the response until we were finished writing the request, this could lead to the following problem: * the receive buffer (rcvbuf) would fill up * due to TCP backpressure, memcached would stop sending (okay as we are not reading anyway), but also stop reading * the send buffer (sndbuf) would also fill up * as we were using a blocking write without timeout, we would block forever (at least with ruby < 3.2, which introduces IO::Timeout, see petergoldstein#967) This is addressed by using IO::select on the sockets for both read and write, and thus start reading as soon as data is available.
fixes petergoldstein#776. fixes petergoldstein#941. When reading a large number of keys, memcached starts sending the response when dalli is not yet finished sending the request. As we did not start reading the response until we were finished writing the request, this could lead to the following problem: * the receive buffer (rcvbuf) would fill up * due to TCP backpressure, memcached would stop sending (okay as we are not reading anyway), but also stop reading * the send buffer (sndbuf) would also fill up * as we were using a blocking write without timeout, we would block forever (at least with ruby < 3.2, which introduces IO::Timeout, see petergoldstein#967) This is addressed by using IO::select on the sockets for both read and write, and thus start reading as soon as data is available.
We are using dalli 2.7.11 with Sinatra 2.1.0 and Ruby 2.7.1, and an AWS Elasticache cluster of 3
cache.m6g.2xlarge
machines.We just migrated from the old memcached gem and had no problems.
Everything is working fine with smaller total keys bitesizes.
But when the total keys bytesize exceeds about 1,700,000 bytes, the
write(req)
call just hangs.We instrumented the dalli gem code and spent two days troubleshooting to come to this conclusion.
To test, we set up a sample key space and then tried running
multi_get
with different numbers of keys. It didn't matter if we had mostly cache misses, or if there was a small or large amount of data stored in each key.Stack Trace: (If we hit Ctrl-C to stop processing)
Can anyone help?
The text was updated successfully, but these errors were encountered: