Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker ci build #507

Merged
merged 6 commits into from
May 2, 2024
Merged

Docker ci build #507

merged 6 commits into from
May 2, 2024

Conversation

TomMelt
Copy link
Contributor

@TomMelt TomMelt commented Mar 8, 2024

Nextsim CI **now with added Docker**

This PR mainly pertains to the ubuntu build of the code.

Previously the CI workflow built nextsim using a GitHub VM instance that would apt install basic dependencies required for the build process.

However, with the addition of xios (#496 ) this is no longer a tractable solution. Furthermore, it would take ~15 mins just to build xios before then going on to build nextsim.

Now I have created a Dockerfile and associated pre-built image (currently on my Dockerhub: tommelt/nextsimdg-dev-env:with-xios) which contains all the prerequisites to build and test the code for our typical CI workflow.

In theory it could also be repurposed by developers as a dev environment (although I personally don't love this).

more about the dockerfile

The Dockerfile is based on a spack environment file (spack.yaml) was generated originally using the following command:

spack containerize > Dockerfile

However, I needed to make some manual adjustments to the Dockerfile (necessary for installing xios and some other dependencies).

Dockerfiles/Dockerfile is the one that was used to generate the docker image tommelt/nextsimdg-dev-env:with-xios.

The image is 2.41GB and contains all prerequisites for MPI and XIOS. When uploaded to dockerhub (docker push tommelt/nextsimdg-dev-env:with-xios) it is compressed to 625.97 MB.

benefits of Docker in CI

  • It's fast
    • only takes 45s to download the image and then 150s to build nextsim 🔥
    • a full CI (for ubuntu) is now less than 4 minutes -- building XIOS alone used to take ~15 minutes
  • It's reproducible
    • one of the problems of github CI is it's hard to debug when things don't work
    • you can now pull the docker image (docker pull tommelt/nextsimdg-dev-env:with-xios) and test stuff locally
  • It's easy
    • nextsim is necessarily adding complexity, which increases the required dependencies
    • the docker image can be built with all dependencies ready-to-go

Changes to the parallel (MPI) test naming convention

  • There was only one existing MPI test before I added the xios one (testRectGrid_MPI).
  • I think it would be sensible to name all MPI tests something like testname_MPIX where X = the number of ranks the test should be run with
  • This is required because doctest specifies minimum/maximum numbers of procs for the test to run. A test that requires 2 procs to run, run with 1 MPI proc instead will "pass" but with a message like "test skipped". This is not a situation we want to end up i.e., thinking our tests pass when they have actually not been run.

To Do:

  • TM to add docker image to github container registry
  • @timspainNERSC to fix MacOS build. I tried to fix MPI/xios build on mac (using tmate plugin to debug github action live) but the fix requires a lot of effort. (Compiling most/all dependencies from source) and it may not be worth it. We could consider not running the MPI/xios tests on mac? For now I have simply disabled MPI and XIOS on the mac build. One other work around would be create a dedicated spack build cache. (decided to not run tests for MPI/XIOS build on mac)
  • add instructions to build the docker image (docker build .)
  • add documentation around new docker image
  • rebase to include Kacpers latest changes and Tim's fix of mac build
  • should we keep existing docker images? or "rebase" them? or just have one with all dependencies and users can build their own version of nextsim to meet their needs? dev environment build is sufficient for now.
  • move pre-built image to nextsim dockerhub repo (do we have a NextSim docker account yet?)

* updated/tidied Dockerfiles
* modified the CI to use the docker image hosted on ghcr.io
* disabled XIOS and MPI for the mac build
* added spack.yaml needed to generate dockerfile
.github/workflows/clang-compile-tests.yml Show resolved Hide resolved
Comment on lines +50 to +63
. /opt/spack-environment/activate.sh
apt install -y wget
cd build
(cd core/test && wget "ftp://ftp.nersc.no/nextsim/netCDF/partition_metadata_1.nc")
for component in core physics
do
cd $component/test
# run serial tests
for file in $(find test* -maxdepth 0 -type f | grep -vP "_MPI\d*$"); do echo $file; ./$file; done
# run MPI tests that require 2 ranks
for file in $(find test* -maxdepth 0 -type f | grep -P "_MPI2$"); do echo $file; mpirun -n 2 --oversubscribe ./$file; done
for file in $(find test* -maxdepth 0 -type f | grep -vP "_MPI\d*$")
do
echo $file
./$file
done
# run MPI tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the download in line 53 happens quickly (or could be put in a different step) then it might be nice to have separate steps for the serial tests and the MPI tests in the Dockerfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose not to split because we can reuse more of the code that way. I don't have a strong preference though. I thought this was just easier to maintain. As for the download it should take about 1s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Leaving thread unresolved because in today's meeting we decided to have separate workflows for MPI and non-MPI tests.]

make
# added Python_ROOT_DIR to help cmake find correct Python in the mac VM
cmake -DENABLE_MPI=OFF -DENABLE_XIOS=OFF -DPython_ROOT_DIR=/Library/Frameworks/Python.framework/Versions/Current/bin ..
make -j 4
- name: run tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do go with separating serial and MPI-parallel tests for the Ubuntu tests as suggested above then it would make sense to rename this step as "run serial tests", or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in the above reply I do not want to split at present because it will lead to some duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Leaving thread unresolved because in today's meeting we decided to have separate workflows for MPI and non-MPI tests.]

docs/installation.rst Show resolved Hide resolved
docs/installation.rst Show resolved Hide resolved
@TomMelt TomMelt requested a review from einola April 22, 2024 11:43
@TomMelt TomMelt marked this pull request as ready for review May 2, 2024 14:16
@timspainNERSC timspainNERSC merged commit 90016cb into xios-interface-with-tests May 2, 2024
3 checks passed
@timspainNERSC timspainNERSC deleted the docker-ci-build branch May 2, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ICCS Tasks or reviews for the ICCS team
Projects
Development

Successfully merging this pull request may close these issues.

3 participants