Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand request-limiting guide to allow programmatic behavior response to rate limiting #19

Open
jamesmanning opened this issue Jun 5, 2014 · 6 comments

Comments

@jamesmanning
Copy link
Contributor

The guide doesn't preclude this, but IMHO it's worth including. RateLimit-Remaining is stated as returning the remaining number of request tokens, but it doesn't say how to specify how much time is left in the current 'window', which I think is worth including as a number of seconds.

  • For the case of not having exhausted a rate limit window yet, a client may want to be nice and spread out the requests evenly among the window. 30 requests left and 60 seconds left in the window, for instance, could lead the client to making a request every 2 seconds. Encoded into SDK's or examples, this could lead to more even load on services over time and less bursts.
  • For the case of having exhausted the window, the remaining request count of 0 is useful, but telling the client how long until the window ends gives them a more actionable message instead of leaving them to do something like a binary exponential backoff algorithm.
  • 'time left' as seconds instead of a timestamp avoids all the time-synchronization issues

I think Twitter's approach here is good, even if codifying this into the guide is considered a bad idea, it might be worth a link as a potential approach?

https://dev.twitter.com/docs/rate-limiting/1.1

  • X-Rate-Limit-Limit: the rate limit ceiling for that given request
  • X-Rate-Limit-Remaining: the number of requests left for the 15 minute window
  • X-Rate-Limit-Reset: the remaining window before the rate limit resets in UTC epoch seconds
@geemus
Copy link
Member

geemus commented Jun 5, 2014

You make great points, but at least in the case of the Heroku API there are no windows per se. We could certainly do a better job of describing this, you can see a somewhat lacking description here: https://devcenter.heroku.com/articles/platform-api-reference#rate-limits (I'll be updating it to more clearly match with the description that follows).

Basically, if your bucket of possible requests is not full (2400 requests by default), we add tokens to your bucket at a steady rate (1200 requests/hour which we distribute more evenly at approximately 1200/60 or 20 additional requests per minute). Since you are gaining toward your limit at a constant rate, it eliminates the need for a window/reset value as well as (we hope) better allowing for bursty behavior if/when you need it.

You can also read more about the premise behind this methodology here: http://en.wikipedia.org/wiki/Token_bucket

I'd love to discuss further and expand the docs and certainly welcome any suggestions about how we can clarify our suggestions around this. Thanks!

@geemus
Copy link
Member

geemus commented Jun 5, 2014

P.S. I've updated https://devcenter.heroku.com/articles/platform-api-reference#rate-limits so that at least it is more accurate. I suspect we'll need to do some work in the guide too, but wanted to make sure we were on the same page before I go too far down that road. Thanks!

@jamesmanning
Copy link
Contributor Author

having a sliding window instead of fixed windows is fine, of course. I didn't mean to imply that fixed windows were required. 😄

My goal is more that client code can programmatically know how long it needs to wait if/when it hits zero, ideally by way of data that's in the response instead of having to put in config values (like the 20 requests per minute).

For the Heroku case, if there's a client that needs to do a burst of 3000 requests, for instance, it'll use up its 2400 and then hit 0, but the response that says "0 left" won't include any information on when the client can try again. If the client is coded 'badly', it may just include a zero-wait retry loop, such that request 2401 ends up hammering until the next 20 tokens are added, then 2421, then 2441, etc.

The part I think is important is that when a client exhausts its limit, the server give it a specific amount of time it needs to wait. If a client has emptied their bucket, then they may end up using those 20 requests in the first couple of seconds of the next minute, and the '0 requests left' response should (IMHO) include something that says "you need to wait at least 58 seconds to try again". Any requests in those 58 seconds are going to fail (since the bucket is empty), but without the server communicating how long it will be until they can make a successful request again, the client either has to hard-code something like 'wait 60 seconds if we hit 0' which depends on the particular windowing choice of the server (which may change over time, ideally without the client needing to change), or the client's left just trying again until it finally gets a successful response.

While clients can certainly be badly written and ignore the 'time you need to wait until you can make another request', if the server has (or can easily calculate) that amount of time, it would appear to take all the guesswork out of the client on how to deal with the rate limiting, and result in fewer calls to the server since clients could be coded to know exactly how long to back off.

I'm sure there are other approaches as well - my main goal is really that well-behaved clients can have enough information sent to them so they can keep from making requests to the server that are going to be rejected due to the bucket being empty, but without having to hard-code things about what windowing behavior is being used on the server. Certainly if they can stay decoupled, that gives the server side the chance to change windowing behavior (for instance, if the client upgrades from a free to a paid subscription, so maybe they get 20 requests every 10 seconds instead of every minute) without having to recode clients.

Sorry for the ramble, and yeah, I certainly didn't mean to imply that sliding window vs. fixed window was a needed debate. :)

@geemus
Copy link
Member

geemus commented Jun 6, 2014

Thanks for the detailed response, I also wasn't trying to pick a fight about fixed vs sliding, just wanted to clarify what we were doing so far (which the guide was at least initially based on) and try to ensure we were both talking about the same things (seems like maybe we weren't, but that now we are at least getting much closer).

Your argument makes a lot of sense. Presently, behind the scenes, we certainly have this info and in some cases it already varies from client to client based on other factors, so making it explicit would definitely be a win, which just leaves us with, how best to do that.

The existing pattern we have from other places, as you mention, seems to fit the fixed window:

  • X-Rate-Limit-Limit: the rate limit ceiling for that given request
  • X-Rate-Limit-Remaining: the number of requests left for the 15 minute window
  • X-Rate-Limit-Reset: the remaining window before the rate limit resets in UTC epoch seconds

For sliding, seems like we want something a bit different, roughly something like:

  • RateLimit-Remaining: current tokens in your bucket
  • RateLimit-Maximum: total tokens you could potentially accumulate
  • RateLimit-Interval: time, in minutes, between drips of RateLimit-Rate tokens in to bucket
  • RateLimit-Rate: quantity of tokens added to bucket per RateLimit-Interval

Presumably you then know everything you need to know to code a client that can react if rate limit is hit (as well as adjust if/when those limits change). What do you think? Is that sufficient? Are they good names? Thanks!

@jamesmanning
Copy link
Contributor Author

I fear I've overcomplicated this a bit. :)

Those headers make sense in terms of trying to communicate the overall windowing mechanism used by the server, but AFAICT that's overkill for the particular use case I'm worried about? At least, in terms of writing the client code, I don't think I would normally use the values in Maximum/Interval/Rate, even though they could be useful if I were trying to plan out requests to make over the next hour/day/etc.

In particular, if you exhaust your tokens with 58 seconds left in the window, none of those 4 response headers tell you "you need to wait at least 58 seconds". RateLimit-Interval tells you how often the bucket will get more tokens, but the client doesn't know how much time it is between 'now' and when the next interval hits, so even thought there might be more tokens added 5 seconds from 'now', the client only knows that the interval is 60 seconds, which the client would either ignore (bad) or would use as a wait time (making it wait longer than it actually needs to).

Since the use case I worry about is "tell client how long to wait if they exhaust their limit", I'd rather the headers be simplified to something that applies equally to either windowing behavior.

Looking at the Twitter headers, we can drop X-Rate-Limit-Limit AFAICT, it's not needed for this use case. Their X-Rate-Limit-Reset header is the one that really has value, since it's the "time to wait if Remaining is 0"

Using the header naming convention you already have in place, that leaves:

  • RateLimit-Remaining: current tokens/requests/thingies left in your bucket/window/container
  • RateLimit-Reset: how long until your "Remaining" count will go up (tokens added, window reset, planets aligned, whatever :)

For the particular use case of "back off the right amount of time when limit is exhausted", the 'Reset' header is only needed if 'Remaining' is 0, so while I don't know how acceptable it would be to have a 'conditional' header, we could save some load on the server and some bandwidth if we only calculate/send the 'Reset' header in the (uncommon, I would imagine, especially for target use case) case of Remaining == 0

The client pseudo-code for a dumb cmdline single-threaded tool is something like:

foreach (request in requests) {
    response = process_request(request)
    if (response.headers['RateLimit-Remaining'] == 0) {
        // exhausted our limit, wait until we can make more requests
        time_to_sleep_in_seconds = response.headers['RateLimit-Reset']
        sleep(time_to_sleep_in_seconds)
    }
}

This lets the client be ignorant of the particular windowing behavior, but still enables the 'optimal' request timing, since the client will complete the full set of 'requests' in the minimum time, just avoiding all the requests that would have failed due to rate limit being exhausted if it didn't include a 'remaining == 0' condition. For the Heroku example, if a client has 3000 requests to make on a given day and has its bucket full, then it'll make the first 2400 'immediately', then the next 600 in batches of 20 in each of the next 30 minutes. The client doesn't know (nor care) that the server is using a 'token bucket' approach, but its request behavior ends up matching the 'ideal' since the server told it the necessary sleep time.

If a 'conditional' HTTP header is a bad idea, then one very-ugly option would be to overload the 'Remaining' header such that the 'remaining is 0' case would send back a response that was negative and represented the number of seconds to wait, so responses of the last 3 requests when exhausting the bucket would be:

  • RateLimit-Remaining: 2
  • RateLimit-Remaining: 1
  • RateLimit-Remaining: -58 // bucket exhausted, we need to wait 58 seconds before making another request

That's quite ugly, and certainly not something I'd put in a guide 😄

ok, that's enough rambling for now. Hopefully I haven't done more harm than good in this response! 😃

@geemus
Copy link
Member

geemus commented Jun 9, 2014

Over-complicating was a group effort ;)

Yeah, I definitely see your point. I suspect some of the extra values I mentioned above might still be handy to reveal somewhere (we also have a GET /rate-limits action that would likely be a good fit) they don't really need always to be around.

Still providing a reset value does certainly simplify. My gut feeling would be to ALWAYS return that, but I would definitely need to dig in to how expensive it would be to do so. I agree that something like what you describe for reset is valuable in this regard, but I would say that reset doesn't seem quite right. After all, it isn't the time in which you return to max (at least in our system), it is just the time at which you receive an influx of additional requests of some amount. I suspect there might be another word or words we could use there that would be a bit tighter fit semantically, but it is nearing end of day for me and I fear I'm not coming up with a concrete example at present. Does that make sense? What do you think?

The idea of overriding remaining is also interesting, especially as it deals (in it's own way) with the conditional header issue. That said, I would agree that it leans toward ugly and would probably try to find other options first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants