Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to select next Libraries.io token from pool #9839

Open
chris48s opened this issue Dec 23, 2023 · 3 comments
Open

Unable to select next Libraries.io token from pool #9839

chris48s opened this issue Dec 23, 2023 · 3 comments
Labels
bug Bugs in badges and the frontend needs-upstream-help Not actionable without help from a service provider service-badge New or updated service badge

Comments

@chris48s
Copy link
Member

Are you experiencing an issue with...

shields.io

🐞 Description

Libraries.io badges frequently show

Unable to select next Libraries.io token from pool

🔗 Link to the badge

example:

(but affects all libraries.io badges)

💡 Possible Solution

I think there's 2 issues we need to fix here:

  1. We are doing enough traffic on libraries.io that we are sometimes exhausting the rate limit on our available tokens. We either need to add another token to the pool or cache for longer to try and fix this.

  2. When we do hit the point where none of the tokens in the pool have any rate limit points left, we don't seem to recover from it well. In reality, if we wait for a bit, the next hour will tick over and we can start using the tokens again, but once we exhaust the pool once we're basically just stuck forever (or until the server cycles). We need to be more resilient to recovering from this state if we do hit it.

@chris48s chris48s added bug Bugs in badges and the frontend service-badge New or updated service badge labels Dec 23, 2023
@chris48s
Copy link
Member Author

OK, so I did a bit of digging into this. I've found at least 2 problems. One of which we can solve, one of which we can't:

Problem 1

When we calculate nextReset in

const nextReset = Date.now() + (retryAfter ? +retryAfter * 1000 : 0)

we are calculating a number of milliseconds.

In

get hasReset() {
return getUtcEpochSeconds() >= this.nextReset
}

we are comparing nextReset to a number of seconds.

So we're never going to reset the token because we're trying to compare a number like 1703797975 (now) to a number like 1703797975973 (nextReset).

This has probably been a bug in our code for a eery long time. We can fix this by setting

const nextReset = ((Date.now() + (retryAfter ? +retryAfter * 1000 : 0)) / 1000) >>> 0

in getRateLimitFromHeaders() or keeping that as it is and comparing it to Date.now() instead of (Date.now() / 1000) >>> 0 or whatever.

Problem 2

Libraries.io no longer returns a x-ratelimit-limit or x-ratelimit-remaining header in any response as far as I can tell. This used to be a problem with 404 responses only librariesio/libraries.io#2860 but now even the 200s don't have it. This means we boot a new server, set totalUsesRemaining to 60, decrement it on every request until it hits 0, mark the token as done and then never increase it again. The fact that eventually x-ratelimit-remaining would have gone up again is probably why we never noticed problem 1.

Keeping track of our own rate limit with no external source is going to be really difficult because we run across multiple servers - we'd have to centralise the count.

For the moment, I've opened an issue upstream librariesio/libraries.io#3283 but I have limited faith we will get a response.

At the moment, we are flooding sentry with a lot of Error('Token pool is exhausted'). I might see if we can do something like temporarily replace these with a deprecation message or something while we see where this goes, just to reduce that spam.

@chris48s
Copy link
Member Author

chris48s commented Jan 1, 2024

As a temporary position, I've dropped us down to a single libraries.io API key in production.

This will mean we serve working badges more often because with a single token, we disable the pooling and don't attempt to keep track of the rate limit. We just use the same token for all requests whether we think we have rate limit left or not. That said, with a single token we won't have enough rate limit for all requests, so we will still serve broken badges once we hit the limit and start getting 429s from upstream.

I think for in the immediate term, that is this is the best solution we've got.

@chris48s
Copy link
Member Author

chris48s commented Apr 5, 2024

Now that librariesio/libraries.io#3351 is merged, I've switched production back to using 2 tokens. This gets us back to where we were before librariesio/libraries.io@6c400b5 happened. I'm hoping this will eliminate or at least reduce the number of 429s we're seeing from libraries.io. I'll keep an eye on this.

I've also submitted #10074 which makes a couple of other tweaks to the polling code that come out of looking into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs in badges and the frontend needs-upstream-help Not actionable without help from a service provider service-badge New or updated service badge
Projects
None yet
Development

No branches or pull requests

1 participant