Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
rfj82982 committed Nov 9, 2023
2 parents 3bd638d + 686c3e4 commit 1afab4b
Show file tree
Hide file tree
Showing 23 changed files with 1,360 additions and 136 deletions.
147 changes: 147 additions & 0 deletions HOWTO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# How to use 2DECOMP&FFT
This document presents the main features of 2DECOMP&FFT library.
Detailed instructions on how to use the 2DECOMP&FFT library can be found
[here](https://2decomp-fft.github.io/).

The 2D Pencil Decomposition API is defined with three Fortran module which should be used by applications as:
```
use decomp_2d_constants
use decomp_2d_mpi
use decomp_2d
```
where ``use decomp_2d_constants`` defines all the parameters, ``use decomp_2d_mpi`` introduces all the MPI
related interfaces and ``use decomp_2d`` contains the main decomposition and transposition APIs. The library is initialized using
```
call decomp_2d_init(nx, ny, nz, p_row, p_col)
```
where ``nx``, ``ny`` and ``nz`` are the spatial dimensions of the problem, to be distributed over
a 2D processor grid :math:`p_row \times p_col`.
Note that none of the dimensions need to be divisible by ``p_row`` or ``p_col`` however a load imbalance will occur if not.
In case of ``p_row=p_col=0`` an automatic decomposition is selected among all possible combination available.
A key element of this library is a set of communication routines that actually perform the data transpositions.
As mentioned, one needs to perform 4 global transpositions to go through all 3 pencil orientations
(i.e. one has to go from x-pencils to y-pencils to z-pencils to y-pencils to x-pencils)
Correspondingly, the library provides 4 communication subroutines:
```
call transpose_x_to_y(var_in,var_out)
call transpose_y_to_z(var_in,var_out)
call transpose_z_to_y(var_in,var_out)
call transpose_y_to_x(var_in,var_out)
```
The input array ``var_in`` and output array ``var_out`` are defined by the code using the library
and contain distributed data for the correct pencil orientations.

Note that the library is written using Fortran's generic interface so different data types are supported
without user input. That means in and out above can be either real or complex arrays,
the latter being useful for applications involving 3D Fast Fourier Transforms.
Finally, before exit, applications should clean up the memory by:
```
call decomp_2d_finalize
```
Detailed information about the decomposition API are available [here](https://2decomp-fft.github.io/pages/api_domain.html)
### Use of the FFT module
To use the FFT programming interface, first of all, one additional Fortran module is needed:
```
use decomp_2d_fft
```
Then one needs to initialise the FFT interface using
```
call decomp_2d_fft_init(pencil, n1, n2, n3)
```
where ``pencil=PHYSICAL_IN_X`` or ``PHYSICAL_IN_Z`` and ``n1, n2, n3`` is an arbitrary problem size
that can be different from :math:`nx\times ny\times nz`.
For complex-to-complex (c2c) FFTs, the user interface is:
```
call decomp_2d_fft_3d(input, output, direction)
```
where ``direction`` can be either ``DECOMP_2D_FFT_FORWARD == -1`` for forward transforms,
or ``DECOMP_2D_FFT_BACKWARD == 1`` for backward transforms.
We recommend using the ``DECOMP_2D_FFT_XXX`` variables, rather than literal ``1`` or ``-1``,
to avoid potential issues if these values change in future versions.
The input array (``input``) and the output one (``output``) are both complex
and have to be either a X-pencil/Z-pencil combination or vice-versa.
The interface for real-to-complex and complex-to-real transform is
```
call decomp_2d_fft_3d(input, output)
```
If the ``input`` data are real type a forward transform is assumed obtaining a complex ``output``.
Similarly a backward FFT is computed if ``input`` is a complex array and ``output`` a real array.
Finally, to release the memory used by the FFT interface:
```
call decomp_2d_fft_finalize
```
Detailed information about the FFT API are available [here](https://2decomp-fft.github.io/pages/api_fft.html)
### Use of the IO module
All the I/O functions have been packed in a Fortran module:
```
use decomp_2d_io
```
To write a single three-dimensional array to a file, the following call to a subroutine can be used:
```
call decomp_2d_write_one(ipencil,var,directory,filename,icoarse,io_name)
```
where ``ipencil`` describes how the data is distributed (valid values are: 1 for X-pencil; 2 for
Y-pencil and 3 for Z-pencil); ``var`` is the data array to be written on disk, which can be either real or
complex; ``directory`` is the path to where I/O should be written; ``filename`` is the name of the
file to be written; ``icoarse`` indicates whether the I/O should be coarsened (valid values are: 0
for no; 1 for the ``nstat`` and 2 for the ``nvisu`` coarsenings); ``io_name`` is the name of the I/O
group to be used. When using ADIOS2 write operations are deferred by default, this means that before the
end of the step the data stored in ``var`` may not have been written. Overwriting ``var``, for example
when used as a temporary variable, would cause the output data to become corrupted. In such situations
``decomp_2d_write_one`` accepts an optional argument ``opt_deferred_writes`` (default ``.true.``) which
when set to ``.false.`` causes the data to be flushed immediately.
The last argument ``io_name`` is a string used to group I/O operations together, and for the ADIOS2 backend
allows for the runtime control of I/O through the file ``adios2_config.xml``, there must be a matching IO
handle to specify the I/O engine - see the examples under ``examples/io_test/`` for how this works.

There are two ways of writing multiple variables to a single file which may
be used for check-pointing purposes, for example. The newer interface is described first and allows
codes to use the ADIOS2 and MPI-IO backends, the original interface is supported for backwards
compatibility.

When ``decomp_2d_write_one`` is called, the ``directory`` and ``io_name`` are combined to check
whether a particular output location is already opened, if not then a new file will be opened and
written to - this is the "standard" use. If, however, a file is opened first then the call to
``decomp_2d_write_one`` will append to the current file, resulting in a single file with multiple
fields. Once the check-pointing is complete the file can then be closed.

The original interface for writing multiple variables to a file, only
supported by the MPI-IO backend, takes the following form:
```
call decomp_2d_write_var(fh,disp,ipencil,var)
```
where ``fh`` is a MPI-IO file handle provided by the application (file opened using ``MPI_FILE_OPEN``);
``ipencil`` describes the distribution of the input data (valid values are: 1 for X-pencil; 2 for
Y-pencil and 3 for Z-pencil); ``disp`` (meaning displacement) is a variable of ``kind MPI_OFFSET_KIND``
and of ``intent INOUT``.
Detailed information about the IO API are available [here](https://2decomp-fft.github.io/pages/api_io.html)
#### Initialising the IO module
Before you can perform I/O you must initialise the IO module by calling
```
call decomp_2d_io_init()
```
somewhere near the start of the program (and before beginning any I/O operations).
The IO module supports multiple handles (see above discussion of ``io_name``), these are initialised by calling
```
call decomp_2d_init_io(io_name)
...
```
#### Registering variables for ADIOS2
The ADIOS2 backend needs information about the variables it is going to write, during initialisation of I/O operations
each variable must be registered by calling
```
call decomp_2d_register_variable(io_name,filename,ipencil,icoarse,iplanes,kind)
```
where the arguments are described as above.
The argument ``iplanes`` is used to declare writing planes from a field - pass ``0`` for 3-D field output and the ``kind``
argument is the argument used when declaring the variable, e.g. ``real(kind)``.
### Use of the halo exchange
The halo-cell support API provides data structures and nearest-neighbour communication routines
that support explicit message passing between neighbouring pencils.
```
call update_halo(var, var_halo, level)
```
Here the first parameter ``var``, a 3D input array, contains the normal pencil-distributed data as defined by the decomposition library.
After the subroutine call, the second parameter ``var_halo`` returns all original data plus halo data from the neighbouring processes.
The third parameter level defines how many layers of overlapping are required.
Detailed information about the halo module are available [here](https://2decomp-fft.github.io/pages/api_halo.html)
2 changes: 1 addition & 1 deletion INSTALL.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Bulding and installing 2DECOMP&FFT
# Building, installing and linking 2DECOMP&FFT

The library 2decomp is a Fortran library compatible with the Fortran 2008 standard.
It requires a MPI library compatible with MPI-2.0 with extended Fortran support.
Expand Down
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# 2DECOMP&FFT

This README contains basic instructions for building and installing the 2DECOMP&FFT library, more
detailed instructions can be found in [INSTALL.md](INSTALL.md).
detailed instructions about installation and linking to the library within an external project
can be found in the [install section](INSTALL.md).
Please have a look at [HOWTO.md](HOWTO.md) and at the examples [examples](examples/README.md) for how to use the library with your application

## Building

Expand Down Expand Up @@ -151,3 +153,8 @@ is ready.
For example, starting from `v2.0.0` the `main` branch will only be updated to receive fixes giving
`v2.0.1`, etc. until the next release (either `v2.1.0` or `v3.0.0` depending on the magnitude of the
change is ready).

### Contributing

If you would like to contribute to the development of the 2DECOMP&FFT library or report a bug please refer to
the [Contributing section](Contribute.md)
1 change: 1 addition & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ add_subdirectory(fft_physical_x)
add_subdirectory(fft_physical_z)
add_subdirectory(halo_test)
add_subdirectory(io_test)
add_subdirectory(grad3d)

# Set real/complex tests
#set(COMPLEX_TESTS "OFF" CACHE STRING "Enables complex numbers for tests that support it")
24 changes: 9 additions & 15 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,11 @@
Examples
========

* init_test - to test the initialisation of the DECOMP2D&FFT library

* test2d - various tests for the 2D pencil decomposition module and timing

* fft_physical_x - various tests for the FFT starting from the ``X`` direction

* fft_physical_z - various tests for the FFT starting from the ``Z`` direction

* halo_test - to test the halo-cell exchange code

* io_test - various tests for the IO module

# Examples

- [Initialization test](init_test) - to test the initialisation of the DECOMP2D&FFT library
- [Test decomposition](test2d) - various tests for the 2D pencil decomposition module and timing
- [Test FFT from X pencil](fft_physical_x) - various tests for the FFT starting from the ``X`` direction
- [Test FFT from Z pencil](fft_physical_z) - various tests for the FFT starting from the ``Z`` direction
- [Test HALO exchange](halo_test) - to test the halo-cell exchange capability
- [Test I/O features](io_test) - various tests for the I/O module
- [Example gradient of a scalar](grad3d) - example of how to compute the gradient of a scalar field

Refer to the README files for each example for details.
25 changes: 0 additions & 25 deletions examples/fft_physical_x/README

This file was deleted.

33 changes: 33 additions & 0 deletions examples/fft_physical_x/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Test FFT for X pencil decomposition

List of the tests:
- [fft_c2c_x](fft_c2c_x.f90): Test Complex to Complex FFT transform;
- [fft_r2c_x](fft_r2c_x.f90): Test Real to Complex FFT transform;
- [fft_grid_x](fft_grid_x.f90): Test Real to Complex transform of a different grid than the one used
for the initialization.


These programs can be used to test the FFT transform using X-pencils as starting domain decomposition.
Both c2c (fft_c2c_x) and r2c/c2r (fft_r2c_x) transforms are tested.
The case fft_grod uses a different resolution from the one used for the initialization.
The results should recover the input data up to machine accuracy
after a forward and a backward transform and appropriate normalisation.
The test automatically resize the problem depending on the number of MPI processes in use

What to input: The program takes max 6 inputs as :

1. p_row [optional]
1. p_col [optional]
1. nx [optional]
1. ny [optional]
1. nz [optional]
1. nt [optional]

If the decomposition is imposed both (1) and (2) are necessary.
If the resolution is imposed (1-5) are necessary

What to expect:
- The timing results
- The error reported should be around machine accuracy (~ 10^-6 for single
precision and 10^-15 for double)
- In case of the GENERIC FFT expect an increase in the order of the error
24 changes: 0 additions & 24 deletions examples/fft_physical_z/README

This file was deleted.

30 changes: 30 additions & 0 deletions examples/fft_physical_z/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Test FFT for Z pencil decomposition

List of the tests:
- [fft_c2c_z](fft_c2c_z.f90): Test Complex to Complex FFT transform;
- [fft_r2c_x](fft_r2c_z.f90): Test Real to Complex FFT transform;


These programs can be used to test the FFT transform using Z-pencils as starting domain decomposition.
Both c2c (fft_c2c_z) and r2c/c2r (fft_r2c_z) transforms are tested.
The results should recover the input data up to machine accuracy
after a forward and a backward transform and appropriate normalisation.
The test automatically resize the problem depending on the number of MPI processes in use

What to input: The program takes max 6 inputs as :

1. p_row [optional]
1. p_col [optional]
1. nx [optional]
1. ny [optional]
1. nz [optional]
1. nt [optional]

If the decomposition is imposed both (1) and (2) are necessary.
If the resolution is imposed (1-5) are necessary

What to expect:
- The timing results
- The error reported should be around machine accuracy (~ 10^-6 for single
precision and 10^-15 for double)
- In case of the GENERIC FFT expect an increase in the order of the error
17 changes: 17 additions & 0 deletions examples/grad3d/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
file(GLOB files_test grad3d.f90)
include_directories(${CMAKE_SOURCE_DIR}/src)

add_executable(grad3d ${files_test})
target_link_libraries(grad3d PRIVATE decomp2d examples_utils)

# Run the test(s)
set(run_dir "${test_dir}/grad3d")
file (COPY "${CMAKE_SOURCE_DIR}/examples/grad3d/adios2_config.xml" DESTINATION ${run_dir})
message(STATUS "Example dir ${run_dir}")
file(MAKE_DIRECTORY ${run_dir})
if (BUILD_TARGET MATCHES "gpu")
file(COPY bind.sh DESTINATION ${run_dir})
add_test(NAME grad3d COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${MPIEXEC_MAX_NUMPROCS} ./bind.sh $<TARGET_FILE:grad3d> ${TEST_ARGUMENTS} WORKING_DIRECTORY ${run_dir})
else ()
add_test(NAME grad3d COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${MPIEXEC_MAX_NUMPROCS} $<TARGET_FILE:grad3d> ${TEST_ARGUMENTS} WORKING_DIRECTORY ${run_dir})
endif ()
24 changes: 24 additions & 0 deletions examples/grad3d/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Gradient example

List of the tests:
- [grad3d](grad3d.f90): Example to compute the gradient of a field.

This example demonstrates the use 2DECOMP&FFT library to compute the gradient
of a field using an explicit second order finite difference scheme.
The purpose is to show how to use the transpose operations to allow explicit calculation
of the gradient in all 3 directions. The results are written to a file and the function
is periodic over the interval [0-1]

What to input: The program takes max 5 inputs as:

1. p_row [optional]
1. p_col [optional]
1. nx [optional]
1. ny [optional]
1. nz [optional]

If the decomposition is imposed both (1) and (2) are necessary.
If the resolution is imposed (1-5) are necessary.

What to expect: the output is the original function and the gradient in the 3 direction.
The program will also give the total error in L2 norm comparing with the anytical solution.
11 changes: 11 additions & 0 deletions examples/grad3d/adios2_config.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<?xml version="1.0"?>
<adios-config>
<io name="grad-io">
<engine type="BP4">
</engine>
<transport type="File">
<parameter key="Library" value="fstream"/>
<parameter key="ProfileUnits" value="Milliseconds"/>
</transport>
</io>
</adios-config>
10 changes: 10 additions & 0 deletions examples/grad3d/bind.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

export LOCAL_RANK=${OMPI_COMM_WORLD_LOCAL_RANK}
export CUDA_VISIBLE_DEVICES=${LOCAL_RANK}

echo "[LOG] local rank $LOCAL_RANK: bind to $CUDA_VISIBLE_DEVICES"
echo ""

$*

Loading

0 comments on commit 1afab4b

Please sign in to comment.