-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI parallelization of thermodynamics #331
Conversation
3481ad1
to
e28cc79
Compare
8bef27b
to
a5e4b77
Compare
|
Based on discussion with Tim and Einar, this should be rebased onto |
We need to be careful not to use MPP_WORLD as the communicator because the model should receive the communicator from OASIS in a coupled mode. |
Yes, it is a good point. I think it is currently quite well encapsulated. Constructor of Nextsim:Model takes MPI communicator as an argument which is then saved in ModelMetadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Need to check that the changes to DevGrid go through cleanly now that it doesn't exist in
develop
. - Need to check DimensionSetter() is still correct in the MPI version now that the ordering of the array dimensions has been swapped.
Otherwise, this looks like a god set of MPI changes.
In most MPI implementation ':' in file name is interpreted as limit of prefix with hint about underlying file system.
It is a temporary state for testing before introducing of fully parallel output.
We will need MPI communicator and rank information to read only relevant part of data.
Currently netcdf-cxx4 does not provide interfaces for parallel access to netCDF files. Provide ones based on ncFile class. Not all methods fron ncFile are implemented, only the ones the will be used by nextsimDG code.
They are needed to write to a correct location in parallel mode.
Array indices were re-ordered in the serial version of the code. I have reflected similar changes in the mpi branch. This fixes `testRectGrid_MPI`. I have also tested the changes on `run_simple_example.sh` and it also produces the same result as the serial code.
disabled the following tests when MPI is enabled: * testRectGrid * testParaGrid * testConfigOutput The latter two have only been disabled temporarily. I have raised issue #454 explaining why the tests are disabled. They will be re-enabled after paragrid is parallelized with MPI. `testRectGrid` has a serial version and an MPI version (`testRectGrid_MPI`). We run the MPI version when MPI is enabled and the serial otherwise.
# Add Tests for MPI version of code ### Task List - [x] add tests - [x] rebase to develop - [x] fix array ordering - [x] ~update the readthedocs / readme with install instructions~ (moved to #455) --- # Change Description I rebased @draenog's `mpi_local` and `mpi_local_tests` branches. This introduced a bug due to array reordering that occurred as part of #426. I have modified the MPI parts of `./core/src/RectGridIO.cpp` to reorder array so that they now match the order in the serial version of the code. --- # Test Description To run `testRectGrid_MPI` you will need the parallel netcdf config file `partition_metadata_1.nc`. I don't want to put it in the github repo for now. As discussed with @timspainNERSC , soon we will hopefully have an ftp server for netcdf files. You can generate the input using the following netcdf definition file `partition_metadata_1.cdl`. ``` // partition_metadata_1.cdl netcdf partition_metadata_1 { dimensions: P = 1 ; L = 1 ; globalX = 25 ; globalY = 15 ; group: bounding_boxes { variables: int global_x(P) ; int global_y(P) ; int local_extent_x(P) ; int local_extent_y(P) ; data: global_x = 0 ; global_y = 0 ; local_extent_x = 25 ; local_extent_y = 15 ; } // group bounding_boxes } ``` Then use `ncgen` to make the netcdf file `partition_metadata_1.nc` ``` ncgen partition_metadata_1.cdl -o partition_metadata_1.nc ```
# Add MPI support to ParaGrid closes #534 closes #454 closes #448 --- # Task List - [x] modify `make_init_para24x30.py` to use new DG data layout and remove CG vars - [x] add MPI doctest support to `ParaGrid_test.cpp` - [x] move `extent` and `start` variables into `ModelArray`'s `DimensionSpec` - [x] add special MPI test case to `ParaGrid_test.cpp` - [x] add MPI support to `getModelState` - [x] add MPI support to `dumpModelState` - [x] add MPI support to `writeDiagnosticTime` - [x] reinstate test `ConfigOutput_test.cpp` for MPI builds - [x] add MPI support to `ERA5`/`TOPAZ` tests --- # Change Description After #331 added MPI parallelisation for thermodynamics on the RectGrid, this PR does the same for the Parametric grid. This should then check off the second task in #120 (_"MPI parallelization of thermodynamics where all operations, except for I/O, are local to an MPI rank"_) --- # Test Description - `ParaGrid_test.cpp` tests core functionality of ParaGrid (serial and MPI) - `./nextsim --config-file config_para24x30.cfg` should provide an integration test (serial and MPI) (based on #506) --- # Further work (for a future PR) - add MPI to dynamics (to close PR #120) - implement halo comms - implement boundary conditions (as part of MPI) - tidy naming of variables in Domain Decomp tool
Background
This PR is part of #120, where the basis of strategy for MPI parallelisation is described.
Change description
As all thermodynamics operations are local to a grid cell all required MPI communication, is handled by NetCDF4 library. Therefore the only required steps are:
NOTE PR #432 should be merged into this branch before it is merged into develop
To run
run_simple_example.sh
you will need to generate the following netcdf file