Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: separate scrape and commit jobs #23

Merged
merged 4 commits into from
Apr 4, 2024
Merged

Conversation

thekaveman
Copy link
Member

The scrape job follows the matrix strategy to get the latest data files for each participant, and then uploads these files as artifacts to the workflow run.

The commit job downloads all artifacts into the data/ directory and then makes a single commit and push when files have changed.

This avoids a couple of problems in the initial implementation:

  • a race condition existed in the matrix strategy, where each matrix branch was attempting to commit files and push to main at nearly the same time, resulting in git conflicts on push.

  • by using BOT_ACCESS_TOKEN on checkout in the commit job, the rest of the git commands are run as the @cal-itp-bot user, which has permission to bypass branch protection rules for the main branch

@thekaveman thekaveman requested a review from a team as a code owner February 27, 2024 03:39
Copy link

github-actions bot commented Feb 27, 2024

Coverage report

This PR does not seem to contain any modification to coverable code.

@thekaveman thekaveman added bug Something isn't working actions Related to GitHub Actions workflows labels Feb 27, 2024
@thekaveman thekaveman self-assigned this Feb 27, 2024
@thekaveman
Copy link
Member Author

It works!! See commit 9f8beb5 from a manual run on this branch 🎉 🥳

thekaveman and others added 4 commits April 4, 2024 14:13
the scrape job follows the matrix strategy to get the latest data files
for each participant, and then uploads these files as artifacts to the
workflow run

the commit job downloads all artifacts into the data/ directory and then
makes a single commit and push when files have changed

this avoids a couple of problems in the initial implementation:

* a race condition existed in the matrix strategy, where each matrix branch
  was attempting to commit files and push to main at nearly the same time,
  resulting in git conflicts on push

* by using BOT_ACCESS_TOKEN on checkout in the commit job, the rest of
  the git commands are run as the cal-itp-bot user, which has permission
  to bypass branch protection rules for the main branch
@thekaveman
Copy link
Member Author

thekaveman commented Apr 4, 2024

I'm just going to merge this to clear the PR from our queue, but we still have this workflow marked as disabled while we consider where the best place to run this scraping should be.

@thekaveman thekaveman merged commit 06d0e81 into main Apr 4, 2024
3 checks passed
@thekaveman thekaveman deleted the fix/scrape-workflow-git branch April 4, 2024 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actions Related to GitHub Actions workflows bug Something isn't working
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants