Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry wrapper #20

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Retry wrapper #20

wants to merge 1 commit into from

Conversation

joecorcoran
Copy link
Contributor

@joecorcoran joecorcoran commented Jul 15, 2016

For discussion

We're getting 404s quite frequently from the GitHub API when
we request pull request data that appears to take a while to
propagate through the GitHub system.

These errors usually cause a Sidekiq job to raise and be retried,
and the second or third attempt is eventually successful. But
it causes a lot of noise in our logs and the errors are too generic
to catch/ignore.

This adds a GH::Retry wrapper that will make a specified number
of retries before raising the 404 as usual.

stack = GH::Stack.build do
  use GH::Retry, retries: 3, wait: 2
end

GH.with(stack) { GH['users/rkh'] }

@joecorcoran joecorcoran changed the title First attempt at a retrying mechanism Retry wrapper Jul 18, 2016
@svenfuchs
Copy link

@joecorcoran i like this. great work! @rkh wdyt?

For the current case in gatekeeper ... the problem that we are trying to solve is that we get those 404s, an exception would bubble up (which is good, that's how Sidekiq works, so it retries), and gets reported (which is bad, since expected). is that correct? i am wondering if we can also improve our exception handling so that it would be aware of these "expected exceptions" and not report them until, e.g. it has retried for the period of time that we do expect GitHub to need until their API is up to date.

While testing Sync i've seen cases where GitHub needed up to 40 seconds. I suppose we wouldn't want Gatekeeper to block (in GH) for that long, so I guess using the Sidekiq retry functionality would be a good thing to do there?

@rkh
Copy link
Contributor

rkh commented Jul 18, 2016

Code and interface look good.

re "block", I agree. How likely is GitHub needing 40 seconds? When it does that, is that only in regards to that one PR or is GH having issues in general? While this doesn't block on the technical level (ie, the thread doesn't really cost us that much while sleeping, doesn't halt execution), it does use up one Sidekiq worker thread.

@joecorcoran
Copy link
Contributor Author

My feeling right now is as follows. It's hard for us to really know if a 404 from GitHub is erroneous. With further work on GH to raise more specific errors and also doing some refactoring in Gatekeeper 2, we can eventually move towards raising errors that would still trigger Sidekiq retries but would be ignored in Sentry.

The goal in the short term is to reduce the noise in our logs/error alerts that come from Sidekiq retries. I'd like to give this wrapper a go, with conservative values that don't hog the worker thread for too long. Maybe I can port this class into Gatekeeper to test it out, with some extra logging, and if it proves useful we can merge this later?

@joecorcoran
Copy link
Contributor Author

There's also the consideration, that workers which raise somewhere in the middle of the database transaction and get retried by Sidekiq x times could be just as expensive overall as a short retry loop in GH...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants