Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send_multiget write call hangs if keys total bytesize > ~1,700,000 bytes #776

Open
justinschier opened this issue Jul 1, 2021 · 7 comments · May be fixed by #942
Open

send_multiget write call hangs if keys total bytesize > ~1,700,000 bytes #776

justinschier opened this issue Jul 1, 2021 · 7 comments · May be fixed by #942

Comments

@justinschier
Copy link

We are using dalli 2.7.11 with Sinatra 2.1.0 and Ruby 2.7.1, and an AWS Elasticache cluster of 3 cache.m6g.2xlarge machines.

We just migrated from the old memcached gem and had no problems.

Everything is working fine with smaller total keys bitesizes.

But when the total keys bytesize exceeds about 1,700,000 bytes, the write(req) call just hangs.

We instrumented the dalli gem code and spent two days troubleshooting to come to this conclusion.

To test, we set up a sample key space and then tried running multi_get with different numbers of keys. It didn't matter if we had mostly cache misses, or if there was a small or large amount of data stored in each key.

keys = []
(0...1_000_000).each do |i|
    keys << "xyz0123:#{i}"
end

dalli_client = Dalli::Client.new(
    <addresses>,
    timeout: 0.5,
    error_when_over_max_size: true
)
dalli_client.get_multi(keys[0...124_000]).size

Stack Trace: (If we hit Ctrl-C to stop processing)

Traceback (most recent call last):
       16: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:76:in `block in get_multi'
       15: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:423:in `get_multi_yielder'
       14: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:364:in `perform'
       13: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:425:in `block in get_multi_yielder'
       12: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/ring.rb:52:in `lock'
       11: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:431:in `block (2 levels) in get_multi_yielder'
       10: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:305:in `make_multi_get_requests'
        9: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:305:in `each'
        8: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/client.rb:311:in `block in make_multi_get_requests'
        7: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/options.rb:18:in `request'
        6: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/options.rb:18:in `synchronize'
        5: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/options.rb:19:in `block in request'
        4: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/server.rb:70:in `request'
        3: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/server.rb:288:in `send_multiget'
        2: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/server.rb:575:in `write'
        1: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-2.7.11/lib/dalli/server.rb:575:in `write'

Can anyone help?

@esilverberg
Copy link

The issue seems to be that it hangs on this line when you have a large number of keys:

result = @sock.write(bytes)

@petergoldstein
Copy link
Owner

@justinschier @esilverberg Sorry for the late reply.

I'm getting a similar result, albeit with somewhat different numbers. When I run a test, I see repeated success with 60k keys, and then failure for almost all cases above that. At 61k the failure happens ~50% of the time.

When I adjust sndbuf and rcvbuf, I can get the number up substantially - I get success at 160k with a 2MB sndbuf and rcvbuf. Oddly, I'd expect only sndbuf to matter, but the improvement goes away if I don't bump rcvbuf.

I'm going to look into this a bit, and see if I can figure out what's going on.

@petergoldstein
Copy link
Owner

So i think I have a handle on this. Basically it looks like memcached is behaving in an unexpected way, which is causing network issues because of how we're reading (or more importantly not reading) from our socket.

Specifically, the multi get implementation does the following:

  1. Issues a number of getkq requests
  2. Issues a noop request
  3. Reads response data from the socket

It works this way because the presumption is that no response is written by memcached (because the getkq's are 'quiet'), until the noop request is written. At that point the client expects that memcached will write all its responses, the client will be pulling response data from the socket, and essentially the buffers will clear out.

This doesn't actually appear to be how it works once you get beyond a certain # of keys / response size. The binary protocol documentation says:

You're not guaranteed a response to a getq/getkq cache hit until you send a non-getq/getkq command later, 
which uncorks the server and bundles up IOs to send to the client in one go.

Note that this is a lot weaker than "You won't receive a response", and in fact it looks like the server bundles up IO once some memcached buffer size is exceeded, even if you're using getkq. This would explain the observed behavior.

A comment on another issue discusses how you can run into this with a non-quiet get, and I suspect that (because the buffer size is being exceeded) our quiet gets wind up behaving just like the non-quiet gets.

If this is the case, then there are basically two alternatives:

  1. Batch the keys before sending them to the Dalli client, say in sets of 25k
  2. Update the Dalli client to batch the keys internally by server, say in sets of 25k

The latter is going to wind up being more efficient in multi-memcached instance environments, but involves substantially more complexity. It may be worth doing - not sure yet.

Another downside of either approach is timeout handling. Essentially the multiget uses the socket timeout as its overall timeout (for large key sets this seems really non-ideal, but that's how the code works now). It would be difficult to preserve that timeout behavior with such batching.

@justinschier @esilverberg I don't know where you wound up internally with this issue, but I think this is likely the cause. I'm going to do some verification, and either give some guidance on (1) in the documentation or look at implementing (2).

@esilverberg
Copy link

@petergoldstein Thank you so much for this detailed analysis and explanation! We thought we were going crazy for a while until we spent some time setting up this particular repro. Our solution was indeed to batch up our key requests, and that seemed to work.

@petergoldstein
Copy link
Owner

@esilverberg Happy to help. Out of curiosity, what batch size are you using?

@esilverberg
Copy link

We are batching every 500,000 keys in our application.

@petergoldstein
Copy link
Owner

I have a thought on how to address this issue and make the get_multi call scale to any number of keys/results. I believe we could alter the flow to pull results off the socket as chunks of keys are being processed. For example, the flow might look like:

  1. Group the keys by server
  2. For each server:
    a. Break the keys into chunks of, say, 10k
    b. After each processed chunk of 10k, call read on the ResponseBuffer, to empty the socket buffer and pull results into Ruby memory. Do not make this call unless the number of keys is >= 10k
    c. Send a noop
    d. Process results currently in buffer
  3. Wait on the sockets for servers whose complete results have not been returned, and proceed as current

This would leave the flow unchanged for smaller numbers of keys, but allow us to handle this larger size case more gracefully.

Any thoughts?

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 12, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 12, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024
fixes petergoldstein#776.
fixes petergoldstein#941.

When reading a large number of keys, memcached starts sending the
response when dalli is not yet finished sending the request.

As we did not start reading the response until we were finished writing
the request, this could lead to the following problem:

* the receive buffer (rcvbuf) would fill up
* due to TCP backpressure, memcached would stop sending (okay as we are
  not reading anyway), but also stop reading
* the send buffer (sndbuf) would also fill up
* as we were using a blocking write without timeout, we would block
  forever (at least with ruby < 3.2, which introduces IO::Timeout, see
  petergoldstein#967)

This is addressed by using IO::select on the sockets for both read and
write, and thus start reading as soon as data is available.
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 2, 2024
fixes petergoldstein#776.
fixes petergoldstein#941.

When reading a large number of keys, memcached starts sending the
response when dalli is not yet finished sending the request.

As we did not start reading the response until we were finished writing
the request, this could lead to the following problem:

* the receive buffer (rcvbuf) would fill up
* due to TCP backpressure, memcached would stop sending (okay as we are
  not reading anyway), but also stop reading
* the send buffer (sndbuf) would also fill up
* as we were using a blocking write without timeout, we would block
  forever (at least with ruby < 3.2, which introduces IO::Timeout, see
  petergoldstein#967)

This is addressed by using IO::select on the sockets for both read and
write, and thus start reading as soon as data is available.
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 2, 2024
fixes petergoldstein#776.
fixes petergoldstein#941.

When reading a large number of keys, memcached starts sending the
response when dalli is not yet finished sending the request.

As we did not start reading the response until we were finished writing
the request, this could lead to the following problem:

* the receive buffer (rcvbuf) would fill up
* due to TCP backpressure, memcached would stop sending (okay as we are
  not reading anyway), but also stop reading
* the send buffer (sndbuf) would also fill up
* as we were using a blocking write without timeout, we would block
  forever (at least with ruby < 3.2, which introduces IO::Timeout, see
  petergoldstein#967)

This is addressed by using IO::select on the sockets for both read and
write, and thus start reading as soon as data is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants