Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI parallelization of thermodynamics #331

Merged
merged 21 commits into from
Dec 4, 2023
Merged

MPI parallelization of thermodynamics #331

merged 21 commits into from
Dec 4, 2023

Conversation

draenog
Copy link
Collaborator

@draenog draenog commented Jun 12, 2023

Background

This PR is part of #120, where the basis of strategy for MPI parallelisation is described.

Change description

As all thermodynamics operations are local to a grid cell all required MPI communication, is handled by NetCDF4 library. Therefore the only required steps are:

  • Initialize and finalize MPI stack
  • Read decomposition metadata on each rank
  • Read and write the necessary part of the grid on each rank (depends on Issue264 netcdf strings #330)
  • Tests for parallel I/O

NOTE PR #432 should be merged into this branch before it is merged into develop

To run run_simple_example.sh you will need to generate the following netcdf file

$ ncdump partition.nc 
netcdf partition {
dimensions:
        P = 1 ;
        L = 1 ;
        globalX = 30 ;
        globalY = 30 ;

group: bounding_boxes {
  variables:
        int global_x(P) ;
        int global_y(P) ;
        int local_extent_x(P) ;
        int local_extent_y(P) ;
  data:

   global_x = 0 ;

   global_y = 0 ;

   local_extent_x = 30 ;

   local_extent_y = 30 ;
  } // group bounding_boxes
}

@draenog draenog added the ICCS Tasks or reviews for the ICCS team label Jun 15, 2023
@draenog draenog added this to the 3 Stand-alone model milestone Jun 22, 2023
@draenog draenog force-pushed the mpi_local branch 3 times, most recently from 3481ad1 to e28cc79 Compare June 25, 2023 23:08
@draenog draenog force-pushed the mpi_local branch 2 times, most recently from 8bef27b to a5e4b77 Compare July 3, 2023 13:30
@dorchard
Copy link
Contributor

develop branch now in a position for this to be turned into a non-draft PR, once the conflicts are resolved, which @draenog will do next week.

@TomMelt
Copy link
Contributor

TomMelt commented Sep 28, 2023

Based on discussion with Tim and Einar, this should be rebased onto feat_mar3. I will then look to rebase #265 onto this (mpi_local) branch.

@draenog draenog changed the base branch from develop to feat_mar3 October 2, 2023 13:37
@einola
Copy link
Member

einola commented Oct 3, 2023

We need to be careful not to use MPP_WORLD as the communicator because the model should receive the communicator from OASIS in a coupled mode.

@draenog
Copy link
Collaborator Author

draenog commented Oct 4, 2023

We need to be careful not to use MPP_WORLD as the communicator because the model should receive the communicator from OASIS in a coupled mode.

Yes, it is a good point. I think it is currently quite well encapsulated. Constructor of Nextsim:Model takes MPI communicator as an argument which is then saved in ModelMetadata.

@TomMelt TomMelt changed the base branch from feat_mar3 to develop November 9, 2023 16:56
@TomMelt TomMelt self-assigned this Nov 9, 2023
@TomMelt TomMelt changed the base branch from develop to main November 9, 2023 17:06
@TomMelt TomMelt changed the base branch from main to develop November 9, 2023 17:07
Copy link
Collaborator

@timspainNERSC timspainNERSC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Need to check that the changes to DevGrid go through cleanly now that it doesn't exist in develop.
  • Need to check DimensionSetter() is still correct in the MPI version now that the ordering of the array dimensions has been swapped.

Otherwise, this looks like a god set of MPI changes.

core/src/RectGridIO.cpp Outdated Show resolved Hide resolved
core/src/StructureFactory.cpp Outdated Show resolved Hide resolved
core/src/modules/include/DevGrid.hpp Outdated Show resolved Hide resolved
In most MPI implementation ':' in file name is interpreted as
limit of prefix with hint about underlying file system.
It is a temporary state for testing before introducing of fully
parallel output.
We will need MPI communicator and rank information to read only
relevant part of data.
Currently netcdf-cxx4 does not provide interfaces for parallel access
to netCDF files. Provide ones based on ncFile class. Not all methods
fron ncFile are implemented, only the ones the will be used by nextsimDG
code.
They are needed to write to a correct location in parallel mode.
draenog and others added 6 commits December 4, 2023 09:13
Array indices were re-ordered in the serial version of the code.

I have reflected similar changes in the mpi branch.

This fixes `testRectGrid_MPI`. I have also tested the changes on
`run_simple_example.sh` and it also produces the same result as the
serial code.
disabled the following tests when MPI is enabled:
* testRectGrid
* testParaGrid
* testConfigOutput

The latter two have only been disabled temporarily.

I have raised issue #454 explaining why the tests are disabled.

They will be re-enabled after paragrid is parallelized with MPI.

`testRectGrid` has a serial version and an MPI version
(`testRectGrid_MPI`). We run the MPI version when MPI is enabled and the
serial otherwise.
@TomMelt TomMelt marked this pull request as ready for review December 4, 2023 11:39
# Add Tests for MPI version of code

### Task List
- [x] add tests
- [x] rebase to develop
- [x] fix array ordering
- [x] ~update the readthedocs / readme with install instructions~ (moved
to #455)

---
# Change Description

I rebased @draenog's `mpi_local` and `mpi_local_tests` branches. This
introduced a bug due to array reordering that occurred as part of #426.
I have modified the MPI parts of `./core/src/RectGridIO.cpp` to reorder
array so that they now match the order in the serial version of the
code.

---
# Test Description

To run `testRectGrid_MPI` you will need the parallel netcdf config file
`partition_metadata_1.nc`. I don't want to put it in the github repo for
now. As discussed with @timspainNERSC , soon we will hopefully have an
ftp server for netcdf files.

You can generate the input using the following netcdf definition file
`partition_metadata_1.cdl`.
```
// partition_metadata_1.cdl
netcdf partition_metadata_1 {
dimensions:
	P = 1 ;
	L = 1 ;
	globalX = 25 ;
	globalY = 15 ;

group: bounding_boxes {
  variables:
  	int global_x(P) ;
  	int global_y(P) ;
  	int local_extent_x(P) ;
  	int local_extent_y(P) ;
  data:

   global_x = 0 ;

   global_y = 0 ;

   local_extent_x = 25 ;

   local_extent_y = 15 ;
  } // group bounding_boxes
}
```

Then use `ncgen` to make the netcdf file `partition_metadata_1.nc`
```
ncgen partition_metadata_1.cdl -o partition_metadata_1.nc
```
@TomMelt TomMelt merged commit 310724a into develop Dec 4, 2023
3 checks passed
@TomMelt TomMelt deleted the mpi_local branch December 4, 2023 17:03
@TomMelt TomMelt mentioned this pull request May 21, 2024
9 tasks
TomMelt added a commit that referenced this pull request Aug 21, 2024
# Add MPI support to ParaGrid
closes #534
closes #454
closes #448 

---
# Task List
- [x] modify `make_init_para24x30.py` to use new DG data layout and
remove CG vars
- [x] add MPI doctest support to `ParaGrid_test.cpp`
- [x] move `extent` and `start` variables into `ModelArray`'s
`DimensionSpec`
- [x] add special MPI test case to `ParaGrid_test.cpp`
- [x] add MPI support to `getModelState`
- [x] add MPI support to `dumpModelState`
- [x] add MPI support to `writeDiagnosticTime`
- [x] reinstate test `ConfigOutput_test.cpp` for MPI builds
- [x] add MPI support to `ERA5`/`TOPAZ` tests

---
# Change Description

After #331 added MPI parallelisation for thermodynamics on the RectGrid,
this PR does the same for the Parametric grid. This should then check
off the second task in #120 (_"MPI parallelization of thermodynamics
where all operations, except for I/O, are local to an MPI rank"_)

---
# Test Description

- `ParaGrid_test.cpp` tests core functionality of ParaGrid (serial and
MPI)
- `./nextsim --config-file config_para24x30.cfg` should provide an
integration test (serial and MPI) (based on #506)

---
# Further work (for a future PR)

- add MPI to dynamics (to close PR #120)
- implement halo comms
- implement boundary conditions (as part of MPI) 
- tidy naming of variables in Domain Decomp tool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ICCS Tasks or reviews for the ICCS team
Projects
Development

Successfully merging this pull request may close these issues.

6 participants