send_multiget write call hangs if keys total bytesize > ~1,700,000 bytes #776

justinschier · 2021-07-01T21:47:03Z

We are using dalli 2.7.11 with Sinatra 2.1.0 and Ruby 2.7.1, and an AWS Elasticache cluster of 3 cache.m6g.2xlarge machines.

We just migrated from the old memcached gem and had no problems.

Everything is working fine with smaller total keys bitesizes.

But when the total keys bytesize exceeds about 1,700,000 bytes, the write(req) call just hangs.

We instrumented the dalli gem code and spent two days troubleshooting to come to this conclusion.

To test, we set up a sample key space and then tried running multi_get with different numbers of keys. It didn't matter if we had mostly cache misses, or if there was a small or large amount of data stored in each key.

keys = []
(0...1_000_000).each do |i|
    keys << "xyz0123:#{i}"
end

dalli_client = Dalli::Client.new(
    <addresses>,
    timeout: 0.5,
    error_when_over_max_size: true
)
dalli_client.get_multi(keys[0...124_000]).size

Stack Trace: (If we hit Ctrl-C to stop processing)

Traceback (most recent call last):
       16: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:76:in `block in get_multi'
       15: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:423:in `get_multi_yielder'
       14: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:364:in `perform'
       13: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:425:in `block in get_multi_yielder'
       12: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/ring.rb:52:in `lock'
       11: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:431:in `block (2 levels) in get_multi_yielder'
       10: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:305:in `make_multi_get_requests'
        9: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:305:in `each'
        8: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:311:in `block in make_multi_get_requests'
        7: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/options.rb:18:in `request'
        6: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/options.rb:18:in `synchronize'
        5: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/options.rb:19:in `block in request'
        4: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/server.rb:70:in `request'
        3: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/server.rb:288:in `send_multiget'
        2: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/server.rb:575:in `write'
        1: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/server.rb:575:in `write'

Can anyone help?

The text was updated successfully, but these errors were encountered:

esilverberg · 2021-07-02T01:07:47Z

The issue seems to be that it hangs on this line when you have a large number of keys:

dalli/lib/dalli/server.rb

Line 563 in 31dabf1

result = @sock.write(bytes)

petergoldstein · 2021-11-16T04:05:01Z

@justinschier @esilverberg Sorry for the late reply.

I'm getting a similar result, albeit with somewhat different numbers. When I run a test, I see repeated success with 60k keys, and then failure for almost all cases above that. At 61k the failure happens ~50% of the time.

When I adjust sndbuf and rcvbuf, I can get the number up substantially - I get success at 160k with a 2MB sndbuf and rcvbuf. Oddly, I'd expect only sndbuf to matter, but the improvement goes away if I don't bump rcvbuf.

I'm going to look into this a bit, and see if I can figure out what's going on.

petergoldstein · 2021-11-16T18:46:19Z

So i think I have a handle on this. Basically it looks like memcached is behaving in an unexpected way, which is causing network issues because of how we're reading (or more importantly not reading) from our socket.

Specifically, the multi get implementation does the following:

Issues a number of getkq requests
Issues a noop request
Reads response data from the socket

It works this way because the presumption is that no response is written by memcached (because the getkq's are 'quiet'), until the noop request is written. At that point the client expects that memcached will write all its responses, the client will be pulling response data from the socket, and essentially the buffers will clear out.

This doesn't actually appear to be how it works once you get beyond a certain # of keys / response size. The binary protocol documentation says:

You're not guaranteed a response to a getq/getkq cache hit until you send a non-getq/getkq command later, 
which uncorks the server and bundles up IOs to send to the client in one go.

Note that this is a lot weaker than "You won't receive a response", and in fact it looks like the server bundles up IO once some memcached buffer size is exceeded, even if you're using getkq. This would explain the observed behavior.

A comment on another issue discusses how you can run into this with a non-quiet get, and I suspect that (because the buffer size is being exceeded) our quiet gets wind up behaving just like the non-quiet gets.

If this is the case, then there are basically two alternatives:

Batch the keys before sending them to the Dalli client, say in sets of 25k
Update the Dalli client to batch the keys internally by server, say in sets of 25k

The latter is going to wind up being more efficient in multi-memcached instance environments, but involves substantially more complexity. It may be worth doing - not sure yet.

Another downside of either approach is timeout handling. Essentially the multiget uses the socket timeout as its overall timeout (for large key sets this seems really non-ideal, but that's how the code works now). It would be difficult to preserve that timeout behavior with such batching.

@justinschier @esilverberg I don't know where you wound up internally with this issue, but I think this is likely the cause. I'm going to do some verification, and either give some guidance on (1) in the documentation or look at implementing (2).

esilverberg · 2021-11-16T19:01:46Z

@petergoldstein Thank you so much for this detailed analysis and explanation! We thought we were going crazy for a while until we spent some time setting up this particular repro. Our solution was indeed to batch up our key requests, and that seemed to work.

petergoldstein · 2021-11-16T19:15:20Z

@esilverberg Happy to help. Out of curiosity, what batch size are you using?

esilverberg · 2021-11-16T19:48:29Z

We are batching every 500,000 keys in our application.

petergoldstein · 2021-12-17T19:56:55Z

I have a thought on how to address this issue and make the get_multi call scale to any number of keys/results. I believe we could alter the flow to pull results off the socket as chunks of keys are being processed. For example, the flow might look like:

Group the keys by server
For each server:
a. Break the keys into chunks of, say, 10k
b. After each processed chunk of 10k, call read on the ResponseBuffer, to empty the socket buffer and pull results into Ruby memory. Do not make this call unless the number of keys is >= 10k
c. Send a noop
d. Process results currently in buffer
Wait on the sockets for servers whose complete results have not been returned, and proceed as current

This would leave the flow unchanged for smaller numbers of keys, but allow us to handle this larger size case more gracefully.

Any thoughts?

fixes petergoldstein#776. fixes petergoldstein#941. When reading a large number of keys, memcached starts sending the response when dalli is not yet finished sending the request. As we did not start reading the response until we were finished writing the request, this could lead to the following problem: * the receive buffer (rcvbuf) would fill up * due to TCP backpressure, memcached would stop sending (okay as we are not reading anyway), but also stop reading * the send buffer (sndbuf) would also fill up * as we were using a blocking write without timeout, we would block forever (at least with ruby < 3.2, which introduces IO::Timeout, see petergoldstein#967) This is addressed by using IO::select on the sockets for both read and write, and thus start reading as soon as data is available.

ghost mentioned this issue Dec 9, 2022

Hanging when performing read_multi against large number of keys #941

Open

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

d31d9b2

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

2a0bcd2

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

350d7b4

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 12, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

03ad88e

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 12, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

dac656d

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024

WIP: first attempt at petergoldstein#776 / petergoldstein#941

1030225

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024

WIP: first attempt at petergoldstein#776 / petergoldstein#941

b84ceae

marvinthepa linked a pull request Nov 2, 2024 that will close this issue

interleave read and write on pipelined_get (#776, #941) #942

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

send_multiget write call hangs if keys total bytesize > ~1,700,000 bytes #776

send_multiget write call hangs if keys total bytesize > ~1,700,000 bytes #776

justinschier commented Jul 1, 2021

esilverberg commented Jul 2, 2021

petergoldstein commented Nov 16, 2021

petergoldstein commented Nov 16, 2021

esilverberg commented Nov 16, 2021

petergoldstein commented Nov 16, 2021

esilverberg commented Nov 16, 2021

petergoldstein commented Dec 17, 2021

send_multiget write call hangs if keys total bytesize > ~1,700,000 bytes #776

send_multiget write call hangs if keys total bytesize > ~1,700,000 bytes #776

Comments

justinschier commented Jul 1, 2021

esilverberg commented Jul 2, 2021

petergoldstein commented Nov 16, 2021

petergoldstein commented Nov 16, 2021

esilverberg commented Nov 16, 2021

petergoldstein commented Nov 16, 2021

esilverberg commented Nov 16, 2021

petergoldstein commented Dec 17, 2021