-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research and hopefully make github-access-manager faster #16
Comments
I looked into it briefly, it’s doing a few requests per repository that we could make in parallel, and also we could request the repositories in parallel, but we’d need to swap out the http logic for something like |
@ruuda I don’t think moving to async IO is needed, for io-bound tasks threads are just fine in Python, so a good ol’ threadpool should be enough as long as we’re careful with rate limiting (some semaphore or whatever). However… Github has a GraphQL API, which should allow us to grab all the data we need in just one call. That sounds like the more robust solution here rather than fooling around with concurrency (which is always "here be dragons" territory). I’ll look into it, it might not be so complicated. |
Ok so basically fsck me. Everything is here in the GraphQL API, perfect, allows to go from 800 requests to a dozen, super fast… except that one small part of getting the teams for a repo. That just doesn’t exist, so back to square one 😭 . |
We could maybe do a combination of both? Though maybe it’s not worth the complexity. |
Yes, that’s how it’s going. GraphQL is still worth it for the other queries, and just for this one it’s gonna be threadpool. It’s not so complex but as I have burned 1.5 days on this I’m gonna do something else more useful and then come back to it 😅 . |
@ruuda although, before I sink more time into this lovely hand-rolled python script… why are we not using terraform for this, which would be my most obvious go to for such a problem (the problem the repo solves I mean, not the speed problem) ? |
Because Terraform doesn’t actually solve the problem, we looked into it before we wrote this, there are a few plugins that can manage some things on GitHub, but they don’t tell you that we have people in the organization, or as collaborators on repos, who should not have access. I also thought something like it must exist, but if it did, I couldn’t find it, so we wrote it. |
Interesting, ok… that may have been the early days of the provider when https://registry.terraform.io/providers/integrations/github/latest/docs/resources/repository_collaborators (which handles individuals and teams, and is authoritative) did not exist. But maybe I don’t understand the problem you had, or you had other problems. |
If I recall correctly, the problem with that one is more of a problem with Terraform’s model of the world. You can define a |
No no, that’s exactly what I mean by "it’s authoritative" ! For github_repository_collaborators the resource fully manages the list of collaborators and teams for one repo, which is what we want. You also have the non-authoritative versions of the same resource, github_repository_collaborator and github_team_repository, which act like you describe. |
This step is fast:
This step is slow, could it iterate in paralel the repos?
The text was updated successfully, but these errors were encountered: