Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xios interface with tests #496

Merged
merged 23 commits into from
May 2, 2024
Merged

Xios interface with tests #496

merged 23 commits into from
May 2, 2024

Conversation

TomMelt
Copy link
Contributor

@TomMelt TomMelt commented Feb 7, 2024

Nextsim XIOS Interface

The previous XIOS work has diverged from the current development branch.

This PR has a minimum working example (MWE) in the form of a test testXiosInit (source ./core/test/XiosInit_test.cpp).

The test checks most of the functionality added by the XIOS C++ interface

  • ./core/src/Xios.cpp
  • ./core/src/include/Xios.hpp

which depend on a C interface included here: ./core/src/include/xios_c_interface.hpp

The core functionality of XIOS is included as part of this PR as the other two components will require it.

XIOS functionality

Still to discuss:

  • we need to discuss Dockerfile because building without one will become more complicated going forward (more dependencies, lxios, lpnetcdf etc.
  • Also how do we want to handle mac builds going forward? I cannot simulate mac (virt OS/docker) so it's not easy for me to "fix" CI we will not support MPI/xios on mac

This PR does the following:

  • add XIOS to cmake
  • add XIOS to the CI workflow
  • add unit tests to check core functionality of the C++ wrapper
  • uses mpi-aware doctest

I think this can be merged in relatively soon. The remaining steps are (for a future PR):

  • to actually integrate XIOS into nextsim (because testXiosInit is a stand-alone binary)
  • to replace the XIOS datetime types with nextsim's
  • produce diagnostic output files

Steps required to run testXiosInit

  • Install XIOS (see the github workflow for an idea how to install XIOS)
  • Build nextsim with XIOS support using -DENABLE_XIOS=ON and -Dxios_DIR=/path/to/xios/install
  • cd to build/core/test and run mpirun -n 2 ./testXiosInit

The test should pass all tests:

[doctest] doctest version is "2.4.11"
[doctest] run with "--help" for options
step 0===============================================================================
[doctest] test cases:  1 |  1 passed | 0 failed | 0 skipped
[doctest] assertions: 19 | 19 passed | 0 failed |
[doctest] Status: SUCCESS!
===============================================================================
[doctest] assertions on all processes:     38 |     38 passed |      0 failed |
===============================================================================
[doctest] Status: SUCCESS!

And produce the following output:

xios_client_1.err
xios_client_0.err
diagnostic.nc
xios_client_1.out
xios_client_0.out
doctest_1.log
doctest_0.log

I don't currently verify the output of diagnostic.nc as part of testXiosInit but this can be added.

This PR is to update the state of XIOS related PRs and Issues. This will close some stale issues
closes #218
closes #291
closes #293

@TomMelt TomMelt added the ICCS Tasks or reviews for the ICCS team label Feb 7, 2024
@TomMelt TomMelt self-assigned this Feb 7, 2024
@TomMelt TomMelt added this to the 3 Stand-alone model milestone Feb 7, 2024
@TomMelt TomMelt marked this pull request as draft February 12, 2024 09:06
@TomMelt TomMelt mentioned this pull request Feb 12, 2024
6 tasks
@TomMelt TomMelt force-pushed the xios-interface-with-tests branch 2 times, most recently from 78e35a4 to e9c96f5 Compare February 23, 2024 17:37
@MarionBWeinzierl
Copy link
Contributor

I can see that you have the same netcdf_par.h error on Ubuntu that I had previously. Did you try setting the same include path to the parallel netcdf version?

ENV CPLUS_INCLUDE_PATH /usr/lib/x86_64-linux-gnu/netcdf/mpi/include

@MarionBWeinzierl
Copy link
Contributor

When I try to run (cd xios && ./make_xios --arch GCC_LINUX --netcdf_lib netcdf4_seq) to install xios I am getting

    /home/workspace/nextsimdg_xios/xios/extern/remap/src/earcut.hpp:26:29: error: ‘uint32_t’ in namespace ‘std’ does not name a type; did you mean ‘wint_t’?
   26 | template <typename N = std::uint32_t>
      |                             ^~~~~~~~
      |                             wint_t
    fcm_internal compile failed (256)
    gmake: *** [/home/workspace/nextsimdg_xios/xios/Makefile:756: meshutil.o] Error 1 
    gmake -f /home/workspace/nextsimdg_xios/xios/Makefile -j 1 all failed (2) at /home/workspace/nextsimdg_xios/xios/tools/FCM_OLD/bin/../lib/Fcm/Build.pm line 597.

@TomMelt
Copy link
Contributor Author

TomMelt commented Mar 1, 2024

I can see that you have the same netcdf_par.h error on Ubuntu that I had previously. Did you try setting the same include path to the parallel netcdf version?

ENV CPLUS_INCLUDE_PATH /usr/lib/x86_64-linux-gnu/netcdf/mpi/include

Yeah but I figured there's no point fixing it if we move to a Docker container anyway. That's why I have left it for now. It will get worse when we add XIOS dependencies. So my plan is, get the docker build working first and then we can just build Nextsim with the dev environment configured already

@TomMelt TomMelt force-pushed the xios-interface-with-tests branch 2 times, most recently from 4c26312 to e9c96f5 Compare March 8, 2024 16:23
@TomMelt TomMelt mentioned this pull request Mar 8, 2024
7 tasks
Copy link
Contributor

@jwallwork23 jwallwork23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

The XIOS interface changes all seem sensible to me, as do the tests. I can confirm that all tests pass locally for me, including the new one with 2 MPI processes.

Installation notes

Starting from a working nextsim (develop branch) and xios (trunk), I tried to manually rebuild nextsim using this branch (xios-interface-with-tests) with xios enabled. I kept hitting an error saying that netcdf_par.h couldn't be found. There is a fix on the xios side (search for netcdf_par on this webpage), but I couldn't figure out how to get this picked up by nextsim. In the end I resorted to the spack approach and it worked the first time.

Worth noting somewhere in the docs that running with -DENABLE_XIOS=ON requires that -DENABLE_MPI=ON, too.

CMakeLists.txt Show resolved Hide resolved
cmake/Findxios.cmake Outdated Show resolved Hide resolved
cmake/Findxios.cmake Outdated Show resolved Hide resolved
docs/installation.rst Outdated Show resolved Hide resolved
docs/installation.rst Show resolved Hide resolved
.github/xios_arch/ubuntu.arch Outdated Show resolved Hide resolved
timspainNERSC added a commit that referenced this pull request May 2, 2024
## Nextsim CI \*\*now with added Docker\*\*

This PR mainly pertains to the ubuntu build of the code. 

Previously the CI workflow built `nextsim` using a GitHub VM instance
that would `apt install` basic dependencies required for the build
process.

However, with the addition of `xios` (#496 ) this is no longer a
tractable solution. Furthermore, it would take ~15 mins just to build
`xios` before then going on to build `nextsim`.

Now I have created a `Dockerfile` and associated pre-built image
(currently on my Dockerhub: `tommelt/nextsimdg-dev-env:with-xios`) which
contains all the prerequisites to build and test the code for our
typical CI workflow.

In theory it could also be repurposed by developers as a dev environment
(although I personally don't love this).

### more about the dockerfile

The `Dockerfile` is based on a `spack` environment file (`spack.yaml`)
was generated originally using the following command:

```
spack containerize > Dockerfile
```

However, I needed to make some manual adjustments to the `Dockerfile`
(necessary for installing `xios` and some other dependencies).

`Dockerfiles/Dockerfile` is the one that was used to generate the docker
image `tommelt/nextsimdg-dev-env:with-xios`.

The image is `2.41GB` and contains all prerequisites for `MPI` and
`XIOS`. When uploaded to dockerhub (`docker push
tommelt/nextsimdg-dev-env:with-xios`) it is compressed to `625.97 MB`.

### benefits of Docker in CI

* It's fast
- only takes 45s to download the image and then 150s to build `nextsim`
:fire:
- a full CI (for ubuntu) is now less than 4 minutes -- building `XIOS`
alone used to take ~15 minutes
* It's reproducible
- one of the problems of github CI is it's hard to debug when things
don't work
- you can now pull the docker image (`docker pull
tommelt/nextsimdg-dev-env:with-xios`) and test stuff locally
* It's easy
- `nextsim` is necessarily adding complexity, which increases the
required dependencies
  - the docker image can be built with all dependencies ready-to-go


### Changes to the parallel (MPI) test naming convention

* There was only one existing MPI test before I added the xios one
(`testRectGrid_MPI`).
* I think it would be sensible to name all MPI tests something like
`testname_MPIX` where `X` = the number of ranks the test should be run
with
* This is required because `doctest` specifies minimum/maximum numbers
of procs for the test to run. A test that requires 2 procs to run, run
with 1 MPI proc instead will "pass" but with a message like "test
skipped". This is not a situation we want to end up i.e., thinking our
tests pass when they have actually not been run.

### To Do:

- [x] TM to add docker image to [github container
registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry)
- [x] ~@timspainNERSC to fix MacOS build.~ I tried to fix MPI/xios build
on mac (using [tmate plugin](https://github.com/mxschmitt/action-tmate)
to debug github action live) but the fix requires a lot of effort.
(Compiling most/all dependencies from source) and it may not be worth
it. We could consider not running the MPI/xios tests on mac? For now I
have simply disabled MPI and XIOS on the mac build. One other work
around would be create a dedicated spack build cache. (decided to not
run tests for MPI/XIOS build on mac)
- [x] add instructions to build the docker image (`docker build .`)
- [x] add documentation around new docker image
- [x] rebase to include Kacpers latest changes and Tim's fix of mac
build
- [x] ~should we keep existing docker images? or "rebase" them? or just
have one with all dependencies and users can build their own version of
nextsim to meet their needs?~ dev environment build is sufficient for
now.
- [x] move pre-built image to nextsim dockerhub repo (do we have a
NextSim docker account yet?)
@TomMelt TomMelt changed the base branch from develop to main May 2, 2024 14:38
@TomMelt TomMelt changed the base branch from main to develop May 2, 2024 14:39
@TomMelt TomMelt force-pushed the xios-interface-with-tests branch from 64836aa to f23042a Compare May 2, 2024 14:42
@TomMelt TomMelt marked this pull request as ready for review May 2, 2024 15:43
Copy link
Collaborator

@timspainNERSC timspainNERSC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTG

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get some doxygen comments on these methods?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to convert to the nextSIM-DG time classes before merging this PR?

@TomMelt TomMelt merged commit 0521214 into develop May 2, 2024
4 checks passed
@TomMelt TomMelt deleted the xios-interface-with-tests branch May 2, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ICCS Tasks or reviews for the ICCS team
Projects
Development

Successfully merging this pull request may close these issues.

5 participants