Merge remote-tracking branch 'origin/main' into dev

rfj82982 · Nov 9, 2023 · 1afab4b · 1afab4b
2 parents 3bd638d + 686c3e4
commit 1afab4b
Show file tree

Hide file tree

Showing 23 changed files with 1,360 additions and 136 deletions.
diff --git a/HOWTO.md b/HOWTO.md
@@ -0,0 +1,147 @@
+# How to use 2DECOMP&FFT
+This document presents the main features of 2DECOMP&FFT library. 
+Detailed instructions on how to use the 2DECOMP&FFT library can be found
+[here](https://2decomp-fft.github.io/).
+
+The 2D Pencil Decomposition API is defined with three Fortran module which should be used by applications as:
+```
+  use decomp_2d_constants
+  use decomp_2d_mpi
+  use decomp_2d
+```
+where ``use decomp_2d_constants`` defines all the parameters, ``use decomp_2d_mpi`` introduces all the MPI 
+related interfaces and ``use decomp_2d`` contains the main decomposition and transposition APIs. The library is initialized using
+```
+  call decomp_2d_init(nx, ny, nz, p_row, p_col)
+```
+where ``nx``, ``ny`` and ``nz`` are the spatial dimensions of the problem, to be distributed over
+a 2D processor grid :math:`p_row \times p_col`.
+Note that none of the dimensions need to be divisible by ``p_row`` or ``p_col`` however a load imbalance will occur if not.
+In case of ``p_row=p_col=0`` an automatic decomposition is selected among all possible combination available.
+A key element of this library is a set of communication routines that actually perform the data transpositions.
+As mentioned, one needs to perform 4 global transpositions to go through all 3 pencil orientations
+(i.e. one has to go from x-pencils to y-pencils to z-pencils to y-pencils to x-pencils)
+Correspondingly, the library provides 4 communication subroutines:
+```
+  call transpose_x_to_y(var_in,var_out)
+  call transpose_y_to_z(var_in,var_out)
+  call transpose_z_to_y(var_in,var_out)
+  call transpose_y_to_x(var_in,var_out)
+```
+The input array ``var_in`` and output array ``var_out`` are defined by the code using the library
+and contain distributed data for the correct pencil orientations.
+
+Note that the library is written using Fortran's generic interface so different data types are supported
+without user input. That means in and out above can be either real or complex arrays,
+the latter being useful for applications involving 3D Fast Fourier Transforms.
+Finally, before exit, applications should clean up the memory by:
+```
+  call decomp_2d_finalize
+```
+Detailed information about the decomposition API are available [here](https://2decomp-fft.github.io/pages/api_domain.html) 
+### Use of the FFT module
+To use the FFT programming interface, first of all, one additional Fortran module is needed:
+```
+  use decomp_2d_fft
+```
+Then one needs to initialise the FFT interface using
+```
+  call decomp_2d_fft_init(pencil, n1, n2, n3)
+```
+where ``pencil=PHYSICAL_IN_X`` or ``PHYSICAL_IN_Z`` and ``n1, n2, n3`` is an arbitrary problem size
+that can be different from :math:`nx\times ny\times nz`.
+For complex-to-complex (c2c) FFTs, the user interface is:
+```
+  call decomp_2d_fft_3d(input, output, direction)
+```
+where ``direction`` can be either ``DECOMP_2D_FFT_FORWARD == -1`` for forward transforms, 
+or ``DECOMP_2D_FFT_BACKWARD == 1`` for backward transforms.
+We recommend using the ``DECOMP_2D_FFT_XXX`` variables, rather than literal ``1`` or ``-1``,
+to avoid potential issues if these values change in future versions.
+The input array (``input``) and the output one (``output``) are both complex
+and have to be either a X-pencil/Z-pencil combination or vice-versa.
+The interface for real-to-complex and complex-to-real transform is
+```
+  call decomp_2d_fft_3d(input, output)
+```
+If the ``input`` data are real type a forward transform is assumed obtaining a complex ``output``.
+Similarly a backward FFT is computed if ``input`` is a complex array and ``output`` a real array.
+Finally, to release the memory used by the FFT interface:
+```
+  call decomp_2d_fft_finalize
+```
+Detailed information about the FFT API are available [here](https://2decomp-fft.github.io/pages/api_fft.html) 
+### Use of the IO module
+All the I/O functions have been packed in a Fortran module:
+```
+  use decomp_2d_io
+```
+To write a single three-dimensional array to a file, the following call to a subroutine can be used:
+```
+  call decomp_2d_write_one(ipencil,var,directory,filename,icoarse,io_name)
+```
+where ``ipencil`` describes how the data is distributed (valid values are: 1 for X-pencil; 2 for
+Y-pencil and 3 for Z-pencil); ``var`` is the data array to be written on disk, which can be either real or
+complex; ``directory`` is the path to where I/O should be written; ``filename`` is the name of the
+file to be written; ``icoarse`` indicates whether the I/O should be coarsened (valid values are: 0
+for no; 1 for the ``nstat`` and 2 for the ``nvisu`` coarsenings); ``io_name`` is the name of the I/O
+group to be used. When using ADIOS2 write operations are deferred by default, this means that before the
+end of the step the data stored in ``var`` may not have been written. Overwriting ``var``, for example
+when used as a temporary variable, would cause the output data to become corrupted. In such situations
+``decomp_2d_write_one`` accepts an optional argument ``opt_deferred_writes`` (default ``.true.``) which
+when set to ``.false.`` causes the data to be flushed immediately.
+The last argument ``io_name`` is a string used to group I/O operations together, and for the ADIOS2 backend
+allows for the runtime control of I/O through the file ``adios2_config.xml``, there must be a matching IO
+handle to specify the I/O engine - see the examples under ``examples/io_test/`` for how this works.
+
+There are two ways of writing multiple variables to a single file which may
+be used for check-pointing purposes, for example. The newer interface is described first and allows
+codes to use the ADIOS2 and MPI-IO backends, the original interface is supported for backwards
+compatibility.
+
+When ``decomp_2d_write_one`` is called, the ``directory`` and ``io_name`` are combined to check
+whether a particular output location is already opened, if not then a new file will be opened and
+written to - this is the "standard" use.  If, however, a file is opened first then the call to
+``decomp_2d_write_one`` will append to the current file, resulting in a single file with multiple
+fields.  Once the check-pointing is complete the file can then be closed.
+
+The original interface for writing multiple variables to a file, only
+supported by the MPI-IO backend, takes the following form:
+```
+   call decomp_2d_write_var(fh,disp,ipencil,var)
+```
+where ``fh`` is a MPI-IO file handle provided by the application (file opened using ``MPI_FILE_OPEN``);
+``ipencil`` describes the distribution of the input data (valid values are: 1 for X-pencil; 2 for
+Y-pencil and 3 for Z-pencil); ``disp`` (meaning displacement) is a variable of ``kind MPI_OFFSET_KIND``
+and of ``intent INOUT``. 
+Detailed information about the IO API are available [here](https://2decomp-fft.github.io/pages/api_io.html) 
+#### Initialising the IO module
+Before you can perform I/O you must initialise the IO module by calling
+```
+call decomp_2d_io_init()
+```
+somewhere near the start of the program (and before beginning any I/O operations).
+The IO module supports multiple handles (see above discussion of ``io_name``), these are initialised by calling
+```
+call decomp_2d_init_io(io_name)
+...
+```
+#### Registering variables for ADIOS2
+The ADIOS2 backend needs information about the variables it is going to write, during initialisation of I/O operations
+each variable must be registered by calling
+```
+call decomp_2d_register_variable(io_name,filename,ipencil,icoarse,iplanes,kind) 
+```
+where the arguments are described as above.
+The argument ``iplanes`` is used to declare writing planes from a field - pass ``0`` for 3-D field output and the ``kind``
+argument is the argument used when declaring the variable, e.g. ``real(kind)``.
+### Use of the halo exchange
+The halo-cell support API provides data structures and nearest-neighbour communication routines 
+that support explicit message passing between neighbouring pencils. 
+```
+  call update_halo(var, var_halo, level)
+```
+Here the first parameter ``var``, a 3D input array, contains the normal pencil-distributed data as defined by the decomposition library. 
+After the subroutine call, the second parameter ``var_halo`` returns all original data plus halo data from the neighbouring processes.
+The third parameter level defines how many layers of overlapping are required. 
+Detailed information about the halo module are available [here](https://2decomp-fft.github.io/pages/api_halo.html) 
diff --git a/INSTALL.md b/INSTALL.md
@@ -1,4 +1,4 @@
-# Bulding and installing 2DECOMP&FFT
+# Building, installing and linking 2DECOMP&FFT
 
 The library 2decomp is a Fortran library compatible with the Fortran 2008 standard.
 It requires a MPI library compatible with MPI-2.0 with extended Fortran support.

diff --git a/README.md b/README.md
@@ -1,7 +1,9 @@
 # 2DECOMP&FFT
 
 This README contains basic instructions for building and installing the 2DECOMP&FFT library, more
-detailed instructions can be found in [INSTALL.md](INSTALL.md).
+detailed instructions about installation and linking to the library within an external project 
+can be found in the [install section](INSTALL.md).
+Please have a look at [HOWTO.md](HOWTO.md) and at the examples [examples](examples/README.md) for how to use the library with your application
 
 ## Building
 
@@ -151,3 +153,8 @@ is ready.
 For example, starting from `v2.0.0` the `main` branch will only be updated to receive fixes giving
 `v2.0.1`, etc. until the next release (either `v2.1.0` or `v3.0.0` depending on the magnitude of the
 change is ready).
+
+### Contributing
+
+If you would like to contribute to the development of the 2DECOMP&FFT library or report a bug please refer to 
+the [Contributing section](Contribute.md)
diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt
@@ -18,6 +18,7 @@ add_subdirectory(fft_physical_x)
 add_subdirectory(fft_physical_z)
 add_subdirectory(halo_test)
 add_subdirectory(io_test)
+add_subdirectory(grad3d)
 
 # Set real/complex tests
 #set(COMPLEX_TESTS "OFF" CACHE STRING "Enables complex numbers for tests that support it")
diff --git a/examples/README.md b/examples/README.md
@@ -1,17 +1,11 @@
-Examples
-========
-
-* init_test      - to test the initialisation of the DECOMP2D&FFT library
-
-* test2d         - various tests for the 2D pencil decomposition module and timing 
-
-* fft_physical_x - various tests for the FFT starting from the ``X`` direction 
-
-* fft_physical_z - various tests for the FFT starting from the ``Z`` direction 
-
-* halo_test      - to test the halo-cell exchange code
-
-* io_test        - various tests for the IO module
-
+# Examples
+
+- [Initialization test](init_test)         - to test the initialisation of the DECOMP2D&FFT library
+- [Test decomposition](test2d)             - various tests for the 2D pencil decomposition module and timing 
+- [Test FFT from X pencil](fft_physical_x) - various tests for the FFT starting from the ``X`` direction 
+- [Test FFT from Z pencil](fft_physical_z) - various tests for the FFT starting from the ``Z`` direction 
+- [Test HALO exchange](halo_test)          - to test the halo-cell exchange capability
+- [Test I/O features](io_test)             - various tests for the I/O module
+- [Example gradient of a scalar](grad3d)   - example of how to compute the gradient of a scalar field
 
 Refer to the README files for each example for details.
diff --git a/examples/fft_physical_x/README b/examples/fft_physical_x/README
diff --git a/examples/fft_physical_x/README.md b/examples/fft_physical_x/README.md
@@ -0,0 +1,33 @@
+# Test FFT for X pencil decomposition 
+
+List of the tests:
+- [fft_c2c_x](fft_c2c_x.f90): Test Complex to Complex FFT transform; 
+- [fft_r2c_x](fft_r2c_x.f90): Test Real to Complex FFT transform;
+- [fft_grid_x](fft_grid_x.f90): Test Real to Complex transform of a different grid than the one used 
+                                for the initialization. 
+
+
+These programs can be used to test the FFT transform using X-pencils as starting domain decomposition. 
+Both c2c (fft_c2c_x) and r2c/c2r (fft_r2c_x) transforms are tested.
+The case fft_grod uses a different resolution from the one used for the initialization. 
+The results should recover the input data up to machine accuracy
+after a forward and a backward transform and appropriate normalisation.
+The test automatically resize the problem depending on the number of MPI processes in use
+
+What to input: The program takes max 6 inputs as : 
+
+1. p_row [optional]
+1. p_col [optional] 
+1. nx    [optional]
+1. ny    [optional]
+1. nz    [optional]
+1. nt    [optional]
+
+If the decomposition is imposed both (1) and (2) are necessary. 
+If the resolution is imposed (1-5) are necessary
+
+What to expect:
+- The timing results 
+- The error reported should be around machine accuracy (~ 10^-6 for single
+  precision and 10^-15 for double)
+- In case of the GENERIC FFT expect an increase in the order of the error
diff --git a/examples/fft_physical_z/README b/examples/fft_physical_z/README
diff --git a/examples/fft_physical_z/README.md b/examples/fft_physical_z/README.md
@@ -0,0 +1,30 @@
+# Test FFT for Z pencil decomposition 
+
+List of the tests:
+- [fft_c2c_z](fft_c2c_z.f90): Test Complex to Complex FFT transform; 
+- [fft_r2c_x](fft_r2c_z.f90): Test Real to Complex FFT transform;
+
+
+These programs can be used to test the FFT transform using Z-pencils as starting domain decomposition. 
+Both c2c (fft_c2c_z) and r2c/c2r (fft_r2c_z) transforms are tested.
+The results should recover the input data up to machine accuracy
+after a forward and a backward transform and appropriate normalisation.
+The test automatically resize the problem depending on the number of MPI processes in use
+
+What to input: The program takes max 6 inputs as : 
+
+1. p_row [optional]
+1. p_col [optional] 
+1. nx    [optional]
+1. ny    [optional]
+1. nz    [optional]
+1. nt    [optional]
+
+If the decomposition is imposed both (1) and (2) are necessary. 
+If the resolution is imposed (1-5) are necessary
+
+What to expect:
+- The timing results 
+- The error reported should be around machine accuracy (~ 10^-6 for single
+  precision and 10^-15 for double)
+- In case of the GENERIC FFT expect an increase in the order of the error
diff --git a/examples/grad3d/CMakeLists.txt b/examples/grad3d/CMakeLists.txt
@@ -0,0 +1,17 @@
+file(GLOB files_test grad3d.f90)
+include_directories(${CMAKE_SOURCE_DIR}/src)
+
+add_executable(grad3d ${files_test})
+target_link_libraries(grad3d PRIVATE decomp2d examples_utils)
+
+# Run the test(s)
+set(run_dir "${test_dir}/grad3d")
+file (COPY "${CMAKE_SOURCE_DIR}/examples/grad3d/adios2_config.xml" DESTINATION ${run_dir})
+message(STATUS "Example dir ${run_dir}")
+file(MAKE_DIRECTORY ${run_dir})
+if (BUILD_TARGET MATCHES "gpu")
+  file(COPY bind.sh DESTINATION ${run_dir})
+  add_test(NAME grad3d COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${MPIEXEC_MAX_NUMPROCS} ./bind.sh $<TARGET_FILE:grad3d> ${TEST_ARGUMENTS} WORKING_DIRECTORY ${run_dir})
+else ()  
+  add_test(NAME grad3d COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${MPIEXEC_MAX_NUMPROCS} $<TARGET_FILE:grad3d> ${TEST_ARGUMENTS} WORKING_DIRECTORY ${run_dir})
+endif ()
diff --git a/examples/grad3d/README.md b/examples/grad3d/README.md
@@ -0,0 +1,24 @@
+# Gradient example
+
+List of the tests:
+- [grad3d](grad3d.f90): Example to compute the gradient of a field. 
+
+This example demonstrates the use 2DECOMP&FFT library to compute the gradient
+of a field using an explicit second order finite difference scheme. 
+The purpose is to show how to use the transpose operations to allow explicit calculation
+of the gradient in all 3 directions. The results are written to a file and the function 
+is periodic over the interval [0-1] 
+
+What to input: The program takes max 5 inputs as: 
+
+1. p_row [optional]
+1. p_col [optional] 
+1. nx    [optional]
+1. ny    [optional]
+1. nz    [optional]
+
+If the decomposition is imposed both (1) and (2) are necessary. 
+If the resolution is imposed (1-5) are necessary.
+
+What to expect: the output is the original function and the gradient in the 3 direction. 
+                The program will also give the total error in L2 norm comparing with the anytical solution. 
diff --git a/examples/grad3d/adios2_config.xml b/examples/grad3d/adios2_config.xml
@@ -0,0 +1,11 @@
+<?xml version="1.0"?>
+<adios-config>
+  <io name="grad-io">
+    <engine type="BP4">
+    </engine>
+    <transport type="File">
+      <parameter key="Library" value="fstream"/>
+      <parameter key="ProfileUnits" value="Milliseconds"/>
+    </transport>
+  </io>
+</adios-config>
diff --git a/examples/grad3d/bind.sh b/examples/grad3d/bind.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+export LOCAL_RANK=${OMPI_COMM_WORLD_LOCAL_RANK}
+export CUDA_VISIBLE_DEVICES=${LOCAL_RANK}
+
+echo "[LOG] local rank $LOCAL_RANK: bind to $CUDA_VISIBLE_DEVICES"
+echo ""
+
+$* 
+