Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research and hopefully make github-access-manager faster #16

Open
kucharskim opened this issue Aug 2, 2024 · 10 comments · May be fixed by #19
Open

Research and hopefully make github-access-manager faster #16

kucharskim opened this issue Aug 2, 2024 · 10 comments · May be fixed by #19
Assignees
Labels
bootcamp Good for newcomers

Comments

@kucharskim
Copy link

This step is fast:

[1 / ??] Listing organization repositories

This step is slow, could it iterate in paralel the repos?

[5 / 425] Getting access on ...
@kucharskim kucharskim added the bootcamp Good for newcomers label Aug 2, 2024
@ruuda
Copy link
Contributor

ruuda commented Sep 19, 2024

I looked into it briefly, it’s doing a few requests per repository that we could make in parallel, and also we could request the repositories in parallel, but we’d need to swap out the http logic for something like aiohttp, it would be a bit tedious.

@leo-chorus-one leo-chorus-one self-assigned this Nov 27, 2024
@leo-chorus-one
Copy link
Contributor

@ruuda I don’t think moving to async IO is needed, for io-bound tasks threads are just fine in Python, so a good ol’ threadpool should be enough as long as we’re careful with rate limiting (some semaphore or whatever).

However… Github has a GraphQL API, which should allow us to grab all the data we need in just one call. That sounds like the more robust solution here rather than fooling around with concurrency (which is always "here be dragons" territory). I’ll look into it, it might not be so complicated.

@leo-chorus-one leo-chorus-one linked a pull request Nov 28, 2024 that will close this issue
@leo-chorus-one
Copy link
Contributor

leo-chorus-one commented Nov 28, 2024

Ok so basically fsck me. Everything is here in the GraphQL API, perfect, allows to go from 800 requests to a dozen, super fast… except that one small part of getting the teams for a repo. That just doesn’t exist, so back to square one 😭 .

@ruuda
Copy link
Contributor

ruuda commented Nov 29, 2024

We could maybe do a combination of both? Though maybe it’s not worth the complexity.

@leo-chorus-one
Copy link
Contributor

Yes, that’s how it’s going. GraphQL is still worth it for the other queries, and just for this one it’s gonna be threadpool. It’s not so complex but as I have burned 1.5 days on this I’m gonna do something else more useful and then come back to it 😅 .

@leo-chorus-one
Copy link
Contributor

@ruuda although, before I sink more time into this lovely hand-rolled python script… why are we not using terraform for this, which would be my most obvious go to for such a problem (the problem the repo solves I mean, not the speed problem) ?

@ruuda
Copy link
Contributor

ruuda commented Dec 4, 2024

Because Terraform doesn’t actually solve the problem, we looked into it before we wrote this, there are a few plugins that can manage some things on GitHub, but they don’t tell you that we have people in the organization, or as collaborators on repos, who should not have access. I also thought something like it must exist, but if it did, I couldn’t find it, so we wrote it.

@leo-chorus-one
Copy link
Contributor

Interesting, ok… that may have been the early days of the provider when https://registry.terraform.io/providers/integrations/github/latest/docs/resources/repository_collaborators (which handles individuals and teams, and is authoritative) did not exist. But maybe I don’t understand the problem you had, or you had other problems.

@ruuda
Copy link
Contributor

ruuda commented Dec 4, 2024

If I recall correctly, the problem with that one is more of a problem with Terraform’s model of the world. You can define a collaboratororganization member, and if it didn’t exist, it will create it. And if you later remove it from your config, it will still be in the Terraform state, and so it will offer to remove it. But if somebody outside of Terraform just added a collaboratormember in the GitHub settings, then Terraform will not be aware of that.

@leo-chorus-one
Copy link
Contributor

No no, that’s exactly what I mean by "it’s authoritative" ! For github_repository_collaborators the resource fully manages the list of collaborators and teams for one repo, which is what we want. You also have the non-authoritative versions of the same resource, github_repository_collaborator and github_team_repository, which act like you describe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bootcamp Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants