Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spack silently failing to git fetch certain package repositories #2

Open
3 tasks done
CodeGat opened this issue Aug 21, 2024 · 6 comments
Open
3 tasks done

spack silently failing to git fetch certain package repositories #2

CodeGat opened this issue Aug 21, 2024 · 6 comments
Assignees

Comments

@CodeGat
Copy link
Member

CodeGat commented Aug 21, 2024

Essentially, spacks git fetch operation is silently failing during spack install for certain package repositories and not updating the local refs, leading to spack.VersionLookupErrors.
For a specific example, see https://github.com/ACCESS-NRI/ACCESS-OM2/actions/runs/10104028532/job/27942359486#step:8:429 in which it states:

VersionLookupError: ea4d313d6731fe6621cab26495685a77963f7cdc is not a valid git ref for mom5

When it clearly is on the remote.

Background

Originally, this issue reared it's ugly head when I had implemented a 'Rolling Tag' logic for Prereleases. This would add and force push (NOTE: bad practice!) the tag in spack.specs[0] (for example, the 2024.08.0 in access-om2@git.2024.08.0) for each commit in a pull request. This was because the root SBD in the spack.yaml (such as access-om2) would expect the tag in the pull request to exist during the spack install. This led to the following scenario, whose end state is eerily similar to the one we have now:

  • Initially, push 2024.08.0 to HEAD of pull request
  • Deploy on Gadi, spack will git fetch --tags on the deployment repo and successfully pull the remote 2024.08.0.
  • Make a modification in the PR, CI will force push 2024.08.0 to the new HEAD (NOTE: still bad practice!)
  • Deploy on Gadi, spack will try to git fetch --tags on the deployment repo and silently fail, leading eventually to a spack.VersionLookupError.

I later removed this moving tag functionality as it was the cause of this iteration of silent failures.
It was debugged by modifying

# TODO: we need to update the local tags if they changed on the
# remote instance, simply adding '-f' may not be sufficient
# (if commits are deleted on the remote, this command alone
# won't properly update the local rev-list)
self.fetcher.git("fetch", "--tags", output=os.devnull, error=os.devnull)
to redirect output and error to str rather than os.devnull (why are they piping output and error to devnull?), which allowed me to see the git output which was failing to update the local tags. Unfortunately this irrevocably broke Prerelease and I had to remake the Prerelease environment, so don't simply redirect the output to str as it polluted a bunch of metadata in spack.

This fixed the issue so I thought I was done with it...

The Current Issue

Linking ACCESS-NRI/ACCESS-OM2#76 - A lot of the later failing runs are a similar issue where we have a spack.VersionLookupError where the ref can't be found locally when it is definitely on the remote. I suspect it's a similar issue to the above where there is an issue running git fetch, but I don't know what. Doing the workaround below solved it, and verified that the branches were not being updated by spack:

From https://github.com/ACCESS-NRI/mom5
 = [up to date]      3-build-ci                             -> 3-build-ci
 = [up to date]      9-mkmf-escape-fix                      -> 9-mkmf-escape-fix
 = [up to date]      access-esm1.5                          -> access-esm1.5
 * [new branch]      delete                                 -> delete
   baaf7ed..d8ece40  development                            -> development
 * [new branch]      dougiesquire/issue388-accessom-gtracer -> dougiesquire/issue388-accessom-gtracer
 * [new branch]      dougiesquire/master-generic-tracers    -> dougiesquire/master-generic-tracers
 = [up to date]      master                                 -> master
 = [up to date]      upstream-master                        -> upstream-master

Things TODO

  • Next time we get a spack.VersionLookupError, copy the entire instance to a local location and see if we can find out what is preventing it from git fetching by either interrogating the git_repos folder, or redirecting spacks own git operations to a file.
  • Fix the issue!
  • Maybe notify spack developers that this is happening...

Workaround

The current workaround is to do the following:

cd $SPACK_ROOT/../git_repos
for repo in $(ls .); do echo "----- $repo -----"; git -C ./$repo ls-tree HEAD --name-only; done  # compare against the package repos contents because these repos are bare
# Once you have found the hash that corresponds to the correct silently failing repo...for example shcnhqk
git -C ./shcnhqk fetch -v -u origin +refs/heads/*:refs/heads/*  # which will pull down the correct refs
@harshula
Copy link
Collaborator

TODO: Double check if the repository can be stale when using spack develop.

@CodeGat
Copy link
Member Author

CodeGat commented Nov 13, 2024

Seems like it can, @harshula! This is the problem that is coming up with the Ocean Team.

@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/ocean-team-workshop-12-15th-november-2024/3857/14

@CodeGat
Copy link
Member Author

CodeGat commented Dec 8, 2024

It happened again with ACCESS-OM3 (access-om3-nuopc package) in ACCESS-NRI/ACCESS-OM3#24 and other places. Running the fix we get:

[tm70_ci@gadi-login-03 git_repos]$ git -C ./fbxdw7q fetch --force -v -u origin +refs/heads/*:refs/heads/*
POST git-upload-pack (155 bytes)
POST git-upload-pack (858 bytes)
remote: Enumerating objects: 32, done.
remote: Counting objects: 100% (32/32), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 32 (delta 17), reused 24 (delta 15), pack-reused 0 (from 0)
Unpacking objects: 100% (32/32), 10.93 KiB | 67.00 KiB/s, done.
From https://github.com/COSIMA/access-om3
 * [new branch]      209-new-build         -> 209-new-build
 = [up to date]      angus-g/ag            -> angus-g/ag
 * [new branch]      cbull_mom6_updates    -> cbull_mom6_updates
 = [up to date]      cm3                   -> cm3
 = [up to date]      fix_ww3_include_dir   -> fix_ww3_include_dir
   1f36419..4f278cc  main                  -> main
 = [up to date]      truncation_file_issue -> truncation_file_issue
 = [up to date]      ww3_execs_to_bin      -> ww3_execs_to_bin
 = [up to date]      ww3_failing_restart   -> ww3_failing_restart
 = [up to date]      ww3_history           -> ww3_history

For one of the PRs, it was like it didn't pick up the * [new branch] cbull_mom6_updates -> cbull_mom6_updates branch at all.

@CodeGat
Copy link
Member Author

CodeGat commented Dec 11, 2024

Linking an issue from upstream: spack#48023

@CodeGat
Copy link
Member Author

CodeGat commented Dec 11, 2024

Current infrastructure workaround will be to fetch all repos in git_repos before running spack install. Will link a PR when opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants