Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ratelimit): race condition on refill #73

Merged
merged 4 commits into from
Sep 28, 2023

Conversation

brayniac
Copy link
Contributor

Reported in #72 as a panic on overflow in debug mode. Root cause is a race between a thread that has successfully refilled the token bucket and a thread that then steals the newly acquired token.

This can be reliably triggered using the repro code in the original issue with the addition of a sleep in the refill function immediately before the return of Ok(()).

What happens before this patch is that thread A refills the bucket successfully, meaning that refill result is Ok(()) but by the time it loads the available count, there are no more tokens available (due to other threads acquiring tokens in parallel). This means that we do not early return a refill error indicating the duration until next refill, but instead proceed with available as zero.

This fix adds another loop in the token acquisition path and breaks the inner loop when refill is successful but there are no tokens remaining. This edge-case results in an attempt to refill the token bucket again. When there is no race, we proceed in the inner compare exchange loop as originally written.

Reported in pelikan-io#72 as a panic on overflow in debug mode. Root cause
is a race between a thread that has successfully refilled the token
bucket and a thread that then steals the newly acquired token.

This can be reliably triggered using the repro code in the original
issue with the addition of a sleep in the refill function
immediately before the return of `Ok(())`.

What happens before this patch is that thread A refills the bucket
successfully, meaning that refill result is `Ok(())` but by the
time it loads the available count, there are no more tokens
available (due to other threads acquiring tokens in parallel). This
means that we do not early return a refill error indicating the
duration until next refill, but instead proceed with available as
zero.

This fix adds another loop in the token acquisition path and breaks
the inner loop when refill is successful but there are no tokens
remaining. This edge-case results in an attempt to refill the
token bucket again. When there is no race, we proceed in the inner
compare exchange loop as originally written.
@brayniac brayniac merged commit 324da09 into pelikan-io:main Sep 28, 2023
12 checks passed
@brayniac brayniac deleted the ratelimit-fix-2 branch September 28, 2023 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants