forked from 2decomp-fft/2decomp-fft
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'origin/main' into dev
- Loading branch information
Showing
23 changed files
with
1,360 additions
and
136 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
# How to use 2DECOMP&FFT | ||
This document presents the main features of 2DECOMP&FFT library. | ||
Detailed instructions on how to use the 2DECOMP&FFT library can be found | ||
[here](https://2decomp-fft.github.io/). | ||
|
||
The 2D Pencil Decomposition API is defined with three Fortran module which should be used by applications as: | ||
``` | ||
use decomp_2d_constants | ||
use decomp_2d_mpi | ||
use decomp_2d | ||
``` | ||
where ``use decomp_2d_constants`` defines all the parameters, ``use decomp_2d_mpi`` introduces all the MPI | ||
related interfaces and ``use decomp_2d`` contains the main decomposition and transposition APIs. The library is initialized using | ||
``` | ||
call decomp_2d_init(nx, ny, nz, p_row, p_col) | ||
``` | ||
where ``nx``, ``ny`` and ``nz`` are the spatial dimensions of the problem, to be distributed over | ||
a 2D processor grid :math:`p_row \times p_col`. | ||
Note that none of the dimensions need to be divisible by ``p_row`` or ``p_col`` however a load imbalance will occur if not. | ||
In case of ``p_row=p_col=0`` an automatic decomposition is selected among all possible combination available. | ||
A key element of this library is a set of communication routines that actually perform the data transpositions. | ||
As mentioned, one needs to perform 4 global transpositions to go through all 3 pencil orientations | ||
(i.e. one has to go from x-pencils to y-pencils to z-pencils to y-pencils to x-pencils) | ||
Correspondingly, the library provides 4 communication subroutines: | ||
``` | ||
call transpose_x_to_y(var_in,var_out) | ||
call transpose_y_to_z(var_in,var_out) | ||
call transpose_z_to_y(var_in,var_out) | ||
call transpose_y_to_x(var_in,var_out) | ||
``` | ||
The input array ``var_in`` and output array ``var_out`` are defined by the code using the library | ||
and contain distributed data for the correct pencil orientations. | ||
|
||
Note that the library is written using Fortran's generic interface so different data types are supported | ||
without user input. That means in and out above can be either real or complex arrays, | ||
the latter being useful for applications involving 3D Fast Fourier Transforms. | ||
Finally, before exit, applications should clean up the memory by: | ||
``` | ||
call decomp_2d_finalize | ||
``` | ||
Detailed information about the decomposition API are available [here](https://2decomp-fft.github.io/pages/api_domain.html) | ||
### Use of the FFT module | ||
To use the FFT programming interface, first of all, one additional Fortran module is needed: | ||
``` | ||
use decomp_2d_fft | ||
``` | ||
Then one needs to initialise the FFT interface using | ||
``` | ||
call decomp_2d_fft_init(pencil, n1, n2, n3) | ||
``` | ||
where ``pencil=PHYSICAL_IN_X`` or ``PHYSICAL_IN_Z`` and ``n1, n2, n3`` is an arbitrary problem size | ||
that can be different from :math:`nx\times ny\times nz`. | ||
For complex-to-complex (c2c) FFTs, the user interface is: | ||
``` | ||
call decomp_2d_fft_3d(input, output, direction) | ||
``` | ||
where ``direction`` can be either ``DECOMP_2D_FFT_FORWARD == -1`` for forward transforms, | ||
or ``DECOMP_2D_FFT_BACKWARD == 1`` for backward transforms. | ||
We recommend using the ``DECOMP_2D_FFT_XXX`` variables, rather than literal ``1`` or ``-1``, | ||
to avoid potential issues if these values change in future versions. | ||
The input array (``input``) and the output one (``output``) are both complex | ||
and have to be either a X-pencil/Z-pencil combination or vice-versa. | ||
The interface for real-to-complex and complex-to-real transform is | ||
``` | ||
call decomp_2d_fft_3d(input, output) | ||
``` | ||
If the ``input`` data are real type a forward transform is assumed obtaining a complex ``output``. | ||
Similarly a backward FFT is computed if ``input`` is a complex array and ``output`` a real array. | ||
Finally, to release the memory used by the FFT interface: | ||
``` | ||
call decomp_2d_fft_finalize | ||
``` | ||
Detailed information about the FFT API are available [here](https://2decomp-fft.github.io/pages/api_fft.html) | ||
### Use of the IO module | ||
All the I/O functions have been packed in a Fortran module: | ||
``` | ||
use decomp_2d_io | ||
``` | ||
To write a single three-dimensional array to a file, the following call to a subroutine can be used: | ||
``` | ||
call decomp_2d_write_one(ipencil,var,directory,filename,icoarse,io_name) | ||
``` | ||
where ``ipencil`` describes how the data is distributed (valid values are: 1 for X-pencil; 2 for | ||
Y-pencil and 3 for Z-pencil); ``var`` is the data array to be written on disk, which can be either real or | ||
complex; ``directory`` is the path to where I/O should be written; ``filename`` is the name of the | ||
file to be written; ``icoarse`` indicates whether the I/O should be coarsened (valid values are: 0 | ||
for no; 1 for the ``nstat`` and 2 for the ``nvisu`` coarsenings); ``io_name`` is the name of the I/O | ||
group to be used. When using ADIOS2 write operations are deferred by default, this means that before the | ||
end of the step the data stored in ``var`` may not have been written. Overwriting ``var``, for example | ||
when used as a temporary variable, would cause the output data to become corrupted. In such situations | ||
``decomp_2d_write_one`` accepts an optional argument ``opt_deferred_writes`` (default ``.true.``) which | ||
when set to ``.false.`` causes the data to be flushed immediately. | ||
The last argument ``io_name`` is a string used to group I/O operations together, and for the ADIOS2 backend | ||
allows for the runtime control of I/O through the file ``adios2_config.xml``, there must be a matching IO | ||
handle to specify the I/O engine - see the examples under ``examples/io_test/`` for how this works. | ||
|
||
There are two ways of writing multiple variables to a single file which may | ||
be used for check-pointing purposes, for example. The newer interface is described first and allows | ||
codes to use the ADIOS2 and MPI-IO backends, the original interface is supported for backwards | ||
compatibility. | ||
|
||
When ``decomp_2d_write_one`` is called, the ``directory`` and ``io_name`` are combined to check | ||
whether a particular output location is already opened, if not then a new file will be opened and | ||
written to - this is the "standard" use. If, however, a file is opened first then the call to | ||
``decomp_2d_write_one`` will append to the current file, resulting in a single file with multiple | ||
fields. Once the check-pointing is complete the file can then be closed. | ||
|
||
The original interface for writing multiple variables to a file, only | ||
supported by the MPI-IO backend, takes the following form: | ||
``` | ||
call decomp_2d_write_var(fh,disp,ipencil,var) | ||
``` | ||
where ``fh`` is a MPI-IO file handle provided by the application (file opened using ``MPI_FILE_OPEN``); | ||
``ipencil`` describes the distribution of the input data (valid values are: 1 for X-pencil; 2 for | ||
Y-pencil and 3 for Z-pencil); ``disp`` (meaning displacement) is a variable of ``kind MPI_OFFSET_KIND`` | ||
and of ``intent INOUT``. | ||
Detailed information about the IO API are available [here](https://2decomp-fft.github.io/pages/api_io.html) | ||
#### Initialising the IO module | ||
Before you can perform I/O you must initialise the IO module by calling | ||
``` | ||
call decomp_2d_io_init() | ||
``` | ||
somewhere near the start of the program (and before beginning any I/O operations). | ||
The IO module supports multiple handles (see above discussion of ``io_name``), these are initialised by calling | ||
``` | ||
call decomp_2d_init_io(io_name) | ||
... | ||
``` | ||
#### Registering variables for ADIOS2 | ||
The ADIOS2 backend needs information about the variables it is going to write, during initialisation of I/O operations | ||
each variable must be registered by calling | ||
``` | ||
call decomp_2d_register_variable(io_name,filename,ipencil,icoarse,iplanes,kind) | ||
``` | ||
where the arguments are described as above. | ||
The argument ``iplanes`` is used to declare writing planes from a field - pass ``0`` for 3-D field output and the ``kind`` | ||
argument is the argument used when declaring the variable, e.g. ``real(kind)``. | ||
### Use of the halo exchange | ||
The halo-cell support API provides data structures and nearest-neighbour communication routines | ||
that support explicit message passing between neighbouring pencils. | ||
``` | ||
call update_halo(var, var_halo, level) | ||
``` | ||
Here the first parameter ``var``, a 3D input array, contains the normal pencil-distributed data as defined by the decomposition library. | ||
After the subroutine call, the second parameter ``var_halo`` returns all original data plus halo data from the neighbouring processes. | ||
The third parameter level defines how many layers of overlapping are required. | ||
Detailed information about the halo module are available [here](https://2decomp-fft.github.io/pages/api_halo.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,11 @@ | ||
Examples | ||
======== | ||
|
||
* init_test - to test the initialisation of the DECOMP2D&FFT library | ||
|
||
* test2d - various tests for the 2D pencil decomposition module and timing | ||
|
||
* fft_physical_x - various tests for the FFT starting from the ``X`` direction | ||
|
||
* fft_physical_z - various tests for the FFT starting from the ``Z`` direction | ||
|
||
* halo_test - to test the halo-cell exchange code | ||
|
||
* io_test - various tests for the IO module | ||
|
||
# Examples | ||
|
||
- [Initialization test](init_test) - to test the initialisation of the DECOMP2D&FFT library | ||
- [Test decomposition](test2d) - various tests for the 2D pencil decomposition module and timing | ||
- [Test FFT from X pencil](fft_physical_x) - various tests for the FFT starting from the ``X`` direction | ||
- [Test FFT from Z pencil](fft_physical_z) - various tests for the FFT starting from the ``Z`` direction | ||
- [Test HALO exchange](halo_test) - to test the halo-cell exchange capability | ||
- [Test I/O features](io_test) - various tests for the I/O module | ||
- [Example gradient of a scalar](grad3d) - example of how to compute the gradient of a scalar field | ||
|
||
Refer to the README files for each example for details. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Test FFT for X pencil decomposition | ||
|
||
List of the tests: | ||
- [fft_c2c_x](fft_c2c_x.f90): Test Complex to Complex FFT transform; | ||
- [fft_r2c_x](fft_r2c_x.f90): Test Real to Complex FFT transform; | ||
- [fft_grid_x](fft_grid_x.f90): Test Real to Complex transform of a different grid than the one used | ||
for the initialization. | ||
|
||
|
||
These programs can be used to test the FFT transform using X-pencils as starting domain decomposition. | ||
Both c2c (fft_c2c_x) and r2c/c2r (fft_r2c_x) transforms are tested. | ||
The case fft_grod uses a different resolution from the one used for the initialization. | ||
The results should recover the input data up to machine accuracy | ||
after a forward and a backward transform and appropriate normalisation. | ||
The test automatically resize the problem depending on the number of MPI processes in use | ||
|
||
What to input: The program takes max 6 inputs as : | ||
|
||
1. p_row [optional] | ||
1. p_col [optional] | ||
1. nx [optional] | ||
1. ny [optional] | ||
1. nz [optional] | ||
1. nt [optional] | ||
|
||
If the decomposition is imposed both (1) and (2) are necessary. | ||
If the resolution is imposed (1-5) are necessary | ||
|
||
What to expect: | ||
- The timing results | ||
- The error reported should be around machine accuracy (~ 10^-6 for single | ||
precision and 10^-15 for double) | ||
- In case of the GENERIC FFT expect an increase in the order of the error |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Test FFT for Z pencil decomposition | ||
|
||
List of the tests: | ||
- [fft_c2c_z](fft_c2c_z.f90): Test Complex to Complex FFT transform; | ||
- [fft_r2c_x](fft_r2c_z.f90): Test Real to Complex FFT transform; | ||
|
||
|
||
These programs can be used to test the FFT transform using Z-pencils as starting domain decomposition. | ||
Both c2c (fft_c2c_z) and r2c/c2r (fft_r2c_z) transforms are tested. | ||
The results should recover the input data up to machine accuracy | ||
after a forward and a backward transform and appropriate normalisation. | ||
The test automatically resize the problem depending on the number of MPI processes in use | ||
|
||
What to input: The program takes max 6 inputs as : | ||
|
||
1. p_row [optional] | ||
1. p_col [optional] | ||
1. nx [optional] | ||
1. ny [optional] | ||
1. nz [optional] | ||
1. nt [optional] | ||
|
||
If the decomposition is imposed both (1) and (2) are necessary. | ||
If the resolution is imposed (1-5) are necessary | ||
|
||
What to expect: | ||
- The timing results | ||
- The error reported should be around machine accuracy (~ 10^-6 for single | ||
precision and 10^-15 for double) | ||
- In case of the GENERIC FFT expect an increase in the order of the error |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
file(GLOB files_test grad3d.f90) | ||
include_directories(${CMAKE_SOURCE_DIR}/src) | ||
|
||
add_executable(grad3d ${files_test}) | ||
target_link_libraries(grad3d PRIVATE decomp2d examples_utils) | ||
|
||
# Run the test(s) | ||
set(run_dir "${test_dir}/grad3d") | ||
file (COPY "${CMAKE_SOURCE_DIR}/examples/grad3d/adios2_config.xml" DESTINATION ${run_dir}) | ||
message(STATUS "Example dir ${run_dir}") | ||
file(MAKE_DIRECTORY ${run_dir}) | ||
if (BUILD_TARGET MATCHES "gpu") | ||
file(COPY bind.sh DESTINATION ${run_dir}) | ||
add_test(NAME grad3d COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${MPIEXEC_MAX_NUMPROCS} ./bind.sh $<TARGET_FILE:grad3d> ${TEST_ARGUMENTS} WORKING_DIRECTORY ${run_dir}) | ||
else () | ||
add_test(NAME grad3d COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${MPIEXEC_MAX_NUMPROCS} $<TARGET_FILE:grad3d> ${TEST_ARGUMENTS} WORKING_DIRECTORY ${run_dir}) | ||
endif () |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Gradient example | ||
|
||
List of the tests: | ||
- [grad3d](grad3d.f90): Example to compute the gradient of a field. | ||
|
||
This example demonstrates the use 2DECOMP&FFT library to compute the gradient | ||
of a field using an explicit second order finite difference scheme. | ||
The purpose is to show how to use the transpose operations to allow explicit calculation | ||
of the gradient in all 3 directions. The results are written to a file and the function | ||
is periodic over the interval [0-1] | ||
|
||
What to input: The program takes max 5 inputs as: | ||
|
||
1. p_row [optional] | ||
1. p_col [optional] | ||
1. nx [optional] | ||
1. ny [optional] | ||
1. nz [optional] | ||
|
||
If the decomposition is imposed both (1) and (2) are necessary. | ||
If the resolution is imposed (1-5) are necessary. | ||
|
||
What to expect: the output is the original function and the gradient in the 3 direction. | ||
The program will also give the total error in L2 norm comparing with the anytical solution. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
<?xml version="1.0"?> | ||
<adios-config> | ||
<io name="grad-io"> | ||
<engine type="BP4"> | ||
</engine> | ||
<transport type="File"> | ||
<parameter key="Library" value="fstream"/> | ||
<parameter key="ProfileUnits" value="Milliseconds"/> | ||
</transport> | ||
</io> | ||
</adios-config> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
#!/bin/bash | ||
|
||
export LOCAL_RANK=${OMPI_COMM_WORLD_LOCAL_RANK} | ||
export CUDA_VISIBLE_DEVICES=${LOCAL_RANK} | ||
|
||
echo "[LOG] local rank $LOCAL_RANK: bind to $CUDA_VISIBLE_DEVICES" | ||
echo "" | ||
|
||
$* | ||
|
Oops, something went wrong.