diff --git a/CHANGELOG.md b/CHANGELOG.md index f8e9540b..1f01898d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [2.1.01] 2024-06-20 +### Changed +- Fix a bug that could result in too restrictive timesteps when resistivity is enabled (#244) +- Fix documentation for reflective boundary conditions (#246) +- Changed performance metric: the performance is now measured per MPI process (and not globally) (#249) +- Remove documentation for replace_idefix_source, as this can't work for .hpp file (#248) + +### Added +- Kokkos execution space configuration is now shown on startup (#248) +- Add CUDA_MALLOC_ASYNC flags in Jean Zay documentation to deal with MPI issues when using Kokkos 4.3 (#248) +- Add a description and link to documentation in readme (#248) +- Add indicative expected performances in documentation (#249) + ## [2.1.0] 2024-05-10 ### Changed - VTK slices are automatically produced along with standard VTK when an emergency abort is triggered. diff --git a/CMakeLists.txt b/CMakeLists.txt index ea2fe74d..b7b1d88a 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -4,7 +4,7 @@ set (CMAKE_CXX_STANDARD 17) set(Idefix_VERSION_MAJOR 2) set(Idefix_VERSION_MINOR 1) -set(Idefix_VERSION_PATCH 00) +set(Idefix_VERSION_PATCH 01) project (idefix VERSION 2.1.00) option(Idefix_MHD "enable MHD" OFF) diff --git a/README.md b/README.md index 52eff077..92e3e79d 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,8 @@ +- [What is Idefix?](#what-is-idefix) +- [Documentation](#documentation) - [Download:](#download) - [Installation:](#installation) - [Compile an example:](#compile-an-example) @@ -17,6 +19,33 @@ +What is Idefix? +--------------- +Idefix is a computational fluid dynamics code based on a finite-volume high-order Godunov method, originally designed for astrophysical fluid dynamics applications. Idefix is designed to be performance-portable, and uses the [Kokkos](https://github.com/kokkos/kokkos) framework to achieve this goal. This means that it can run both on your laptop's cpu and on the largest GPU Exascale clusters. More technically, Idefix can run in serial, use OpenMP and/or MPI (message passing interface) for parallelization, and use GPU acceleration when available (based on Nvidia Cuda, AMD HIP, etc...). All these capabilities are embedded within one single code, so the code relies on relatively abstracted classes and objects available in C++17, which are not necessarily +familiar to astrophysicists. A large effort has been devoted to simplify this level of abstraction so that the code can be modified by researchers and students familiar with C and who are aware of basic object-oriented concepts. + + +Idefix currently supports the following physics: + +* Compressible hydrodynamics in 1D, 2D, 3D +* Compressible magnetohydrodynamics using constrained transport in 1D, 2D, 3D +* Multiple geometry (cartesian, polar, spherical) +* Variable mesh spacing +* Multiple parallelisation strategies (OpenMP, MPI, GPU offloading, etc...) +* Full non-ideal MHD (Ohmic, ambipolar, Hall) +* Viscosity and thermal diffusion +* Super-timestepping for all parabolic terms +* Orbital advection (Fargo-like) +* Self-gravity +* Multi dust species modelled as pressureless fluids +* Multiple planets interraction + +Documentation +------------- + +A full online documentation is available on [readTheDoc](https://idefix.readthedocs.io/latest/). + + Download: --------- @@ -56,10 +85,8 @@ Configure the code launching cmake (version >= 3.16) in the example directory: cmake $IDEFIX_DIR ``` -Several options can be enabled from the command line (a complete list is available with `cmake $IDEFIX_DIR -LH`). For instance: `-DIdefix_RECONSTRUCTION=Parabolic` (enable PPM reconstruction), `-DIdefix_MPI=ON` (enable mpi), `-DKokkos_ENABLE_OPENMP=ON` (enable openmp parallelisation), etc... For more complex target architectures, it is recommended to use cmake GUI launching `ccmake $IDEFIX_DIR` in place of `cmake` and then switching on the required options. +Several options can be enabled from the command line (a complete list is available with `cmake $IDEFIX_DIR -LH`). For instance: `-DIdefix_RECONSTRUCTION=Parabolic` (enable PPM reconstruction), `-DIdefix_MPI=ON` (enable mpi), `-DKokkos_ENABLE_OPENMP=ON` (enable openmp parallelisation), etc... For more complex target architectures, it is recommended to use cmake GUI launching `ccmake $IDEFIX_DIR` in place of `cmake` and then switching on the required options. See the [online documentation](https://idefix.readthedocs.io/latest/) for details. -Optional xdmf(hdf5+xmf) file dumping feature has been added to `Idefix`. This uses either serial or parallel implementation of `hdf5` library which needs to be made available. These xdmf file pairs can be easily visualized in `ParaView` or `VisIt` by loading the `xmf` files. The `hdf5` files can also be loaded easily in `python` (using `h5py`) for post-processing and post-run analysis. One can turn on `xdmf` data dumps by using `-DIdefix_HDF5=ON`. The `[Output]` block of `.ini` file is checked during runtime for a `xdmf` entry whih controls the frequency of xdmf file dumps during code execution. - One can then compile the code: diff --git a/doc/source/conf.py b/doc/source/conf.py index dc1a4d68..9ba65d75 100644 --- a/doc/source/conf.py +++ b/doc/source/conf.py @@ -23,7 +23,7 @@ author = 'Geoffroy Lesur' # The full version, including alpha/beta/rc tags -release = '2.1.00' +release = '2.1.01' diff --git a/doc/source/index.rst b/doc/source/index.rst index c0d96f12..fa8dadb3 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -55,7 +55,7 @@ Terms and condition of Use =========================== *Idefix* is distributed freely under the `CeCILL license `_, a free software license adapted to both international and French legal matters, in the spirit of and retaining compatibility with the GNU General Public License (GPL). We expect *Idefix* to be referenced and acknowledeged by authors in their publications. At the minimum, the authors -should cite the *Idefix* `method paper `_. +should cite the *Idefix* `method paper `_. *Idefix* data structure and algorithm are derived from Andrea Mignone's `PLUTO code `_, released under the GPL license. *Idefix* also relies on the `Kokkos `_ performance portability programming ecosystem released under the terms @@ -74,6 +74,9 @@ Soufiane Baghdadi Gaylor Wafflard-Fernandez planet-disc interaction +Jonah Mauxion + self-gravity module + Clément Robert gitlab integration, linter @@ -96,8 +99,12 @@ This documentation has automatically been generated on |today| from the followin Acknowledgements =================== -The developement of *Idefix* is supported by the European Research Council (ERC) -under the European Union Horizon 2020 research and innovation programme (Grant agreement No. 815559 (MHDiscs)) +The developement of *Idefix* was supported by the European Research Council (ERC) +under the European Union Horizon 2020 research and innovation programme (Grant agreement No. 815559 (MHDiscs)). +Idefix developement team is partly funded by the `PEPR Origins `_ through the project "MHD@Exascale". +The Idefix collaboration benefited from funding from the “Programme National de Physique Stellaire” (PNPS), +“Programme National Soleil-Terre” (PNST), “Programme National de Hautes Energies” (PNHE) and +“Programme National de Planétologie” (PNP) of CNRS/INSU co-funded by CEA and CNES. .. toctree:: @@ -108,6 +115,7 @@ under the European Union Horizon 2020 research and innovation programme (Grant a reference modules programmingguide + performances kokkos contributing faq diff --git a/doc/source/performances.rst b/doc/source/performances.rst new file mode 100644 index 00000000..2222c2e1 --- /dev/null +++ b/doc/source/performances.rst @@ -0,0 +1,48 @@ +====================== +Performances +====================== + +We report below the performances obtained on various architectures using Idefix. The reference test +is the 3D MHD Orszag-Tang test problem with 2nd order reconstruction and uct_contact EMFS bundled in +Idefix test suite, computed with a 128\ :sup:`3` resolution per MPI sub-domain on GPUs or 32\ :sup:`3` +per MPI sub-domain on CPUs. All of the performances measures have been obtained enabling MPI on +*one full node*, but we report here the performance *per GPU* +(i.e. with 2 GCDs on AMD Mi250) or *per core* (on CPU), i.e. dividing the node performance by the number of GPU/core +to simplify the comparison with other clusters. + +The complete scalability tests are available in Idefix `method paper `_. +The performances mentionned below are updated for each major revision of Idefix, so they might slightly differ from the method paper. + +.. note:: + + You might expect + slower performances with lower resolution when using GPUs. The overall performances also depends on + the physical modules activated, the reconstruction scheme, and the efficiency of the parallel network + on which you are running. The performances reported below are therefore purely indicative. We encourage + you to use the embedded profiler (see :ref:`commandLine` ) when performances are smaller than expected. + + +CPU performances +================ + ++---------------------+--------------------+----------------------------------------------------+ +| Cluster name | Processor | Performances (in 10\ :sup:`6` cell/s/core) | ++=====================+====================+====================================================+ +| TGCC/Irene Rome | AMD EPYC Rome | 0.29 | ++---------------------+--------------------+----------------------------------------------------+ +| IDRIS/Jean Zay | Intel Cascade Lake | 0.62 | ++---------------------+--------------------+----------------------------------------------------+ + + +GPU performances +================ + ++----------------------+--------------------+----------------------------------------------------+ +| Cluster name | GPU | Performances (in 10\ :sup:`6` cell/s/GPU) | ++======================+====================+====================================================+ +| IDRIS/Jean Zay | NVIDIA V100 | 110 | ++----------------------+--------------------+----------------------------------------------------+ +| IDRIS/Jean Zay | NVIDIA A100 | 194 | ++----------------------+--------------------+----------------------------------------------------+ +| CINES/Adastra | AMD Mi250 | 250 | ++----------------------+--------------------+----------------------------------------------------+ diff --git a/doc/source/reference/idefix.ini.rst b/doc/source/reference/idefix.ini.rst index ae2b8e5a..066db997 100644 --- a/doc/source/reference/idefix.ini.rst +++ b/doc/source/reference/idefix.ini.rst @@ -343,7 +343,8 @@ and ``X1-end``, ``X2-end``, ``X3-end`` for the right boundaries. Each boundary c +----------------+------------------------------------------------------------------------------------------------------------------+ | periodic | Periodic boundary conditions. Each field is copied between beg and end sides of the boundary. | +----------------+------------------------------------------------------------------------------------------------------------------+ -| reflective | The normal component of the velocity is systematically reversed. Otherwise identical to ``outflow``. | +| reflective | | Mirror the normal component of the velocity field and the tangential components of the magnetic field. | +| | | Zero gradient on the other components (tangential velocity and normal field). | +----------------+------------------------------------------------------------------------------------------------------------------+ | shearingbox | Shearing-box boudary conditions. | +----------------+------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/source/reference/makefile.rst b/doc/source/reference/makefile.rst index 8e53c224..38e90fbf 100644 --- a/doc/source/reference/makefile.rst +++ b/doc/source/reference/makefile.rst @@ -108,15 +108,17 @@ We recommend the following modules and environement variables on AdAstra: .. code-block:: bash - module load PrgEnv-cray-amd - module load cray-mpich - module load craype-network-ofi - module load cce - module load cpe - module load rocm/5.2.0 - export LDFLAGS="-L${ROCM_PATH}/lib -lamdhip64 -lstdc++fs" - -The last line being there to guarantee the link to the HIP library and the access to specific + module load cpe/23.12 + module load craype-accel-amd-gfx90a craype-x86-trento + module load PrgEnv-cray + module load amd-mixed/5.7.1 + module load rocm/5.7.1 # nécessaire a cause d'un bug de path pas encore fix.. + export HIPCC_COMPILE_FLAGS_APPEND="-isystem ${CRAY_MPICH_PREFIX}/include" + export HIPCC_LINK_FLAGS_APPEND="-L${CRAY_MPICH_PREFIX}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a} -lstdc++fs" + export CXX=hipcc + export CC=hipcc + +The `-lstdc++fs` option being there to guarantee the link to the HIP library and the access to specific C++17 functions. Finally, *Idefix* can be configured to run on Mi250 by enabling HIP and the desired architecture with the following options to ccmake: @@ -144,15 +146,16 @@ We recommend the following modules and environement variables on Jean Zay: .. code-block:: bash - -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_VOLTA70=ON + -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_VOLTA70=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF While Ampere A100 GPUs are enabled with .. code-block:: bash - -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_AMPERE80=ON + -DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_AMPERE80=ON -DKokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF -MPI (multi-GPU) can be enabled by adding ``-DIdefix_MPI=ON`` as usual. +MPI (multi-GPU) can be enabled by adding ``-DIdefix_MPI=ON`` as usual. The malloc async option is here to prevent a bug when using PSM2 with async +cuda malloc possibly leading to openmpi crash or hangs on the Jean Zay machine. .. _setupSpecificOptions: @@ -174,7 +177,7 @@ explicitely the options as they are required, using the functions ``set_idefix_p .. _customSourceFiles: -Add/replace custom source files +Add custom source files +++++++++++++++++++++++++++++++ It is possible to add custom source files to be compiled and linked against *Idefix*. This can be useful @@ -189,21 +192,6 @@ say you want to add source files for an analysis, your ``CMakeLists.txt`` should add_idefix_source(analysis.hpp) -*Idefix* also allows one to replace a source file in `$IDEFIX_DIR` by your own implementation. This is useful when developping new functionnalities without touching -the main directory of your *Idefix* repository. For instance, say one wants to replace the implementation of viscosity in `$IDEFIX_SRC/src/hydro/viscosity.cpp`, -with a customised `myviscosity.cpp` in the problem directory, one should add a ``CMakeLists.txt`` in the problem directory reading - -.. code-block:: - :caption: CMakeLists.txt - - replace_idefix_source(hydro/viscosity.cpp myviscosity.cpp) - - -Note that the first parameter of ``replace_idefix_source`` is used as a search pattern in `$IDEFIX_DIR`. Hence it is possible to ommit the parent directory -of the file being replaced if there is only one file with that name in the *Idefix* source directory, which is not guaranteed (some classes may implement -methods with the same name). It is therefore recommended to add the parent directory in the first argument of ``replace_idefix_source``. - - .. tip:: Don't forget to delete `CMakeCache.txt` before attempting to reconfigure the code when adding a problem-specific diff --git a/src/fluid/addNonIdealMHDFlux.hpp b/src/fluid/addNonIdealMHDFlux.hpp index 4ea98c76..52c281d2 100644 --- a/src/fluid/addNonIdealMHDFlux.hpp +++ b/src/fluid/addNonIdealMHDFlux.hpp @@ -258,7 +258,7 @@ void Fluid::AddNonIdealMHDFlux(const real t) { #if HAVE_ENERGY Flux(ENG,k,j,i) += - Bx1 * eta * Jx2 + Bx2 * eta * Jx1; #endif - dMax(k,j,i) += eta; + locdmax += eta; } if(haveAmbipolar) { diff --git a/src/input.cpp b/src/input.cpp index 22006a59..33641495 100644 --- a/src/input.cpp +++ b/src/input.cpp @@ -226,6 +226,12 @@ void Input::ShowConfig() { idfx::cout << "-----------------------------------------------------------------------------" << std::endl; + std::stringstream os; + Kokkos::DefaultExecutionSpace().print_configuration(os, true); + idfx::cout << "Input: Kokkos configuration" << std::endl << os.str(); + idfx::cout << "-----------------------------------------------------------------------------" + << std::endl; + #ifdef SINGLE_PRECISION idfx::cout << "Input: Compiled with SINGLE PRECISION arithmetic." << std::endl; #else @@ -237,15 +243,6 @@ void Input::ShowConfig() { #ifdef WITH_MPI idfx::cout << "Input: MPI ENABLED." << std::endl; #endif - #ifdef KOKKOS_ENABLE_HIP - idfx::cout << "Input: Kokkos HIP target ENABLED." << std::endl; - #endif - #ifdef KOKKOS_ENABLE_CUDA - idfx::cout << "Input: Kokkos CUDA target ENABLED." << std::endl; - #endif - #ifdef KOKKOS_ENABLE_OPENMP - idfx::cout << "Input: Kokkos OpenMP ENABLED." << std::endl; - #endif } // This routine is called whenever a specific OS signal is caught diff --git a/src/main.cpp b/src/main.cpp index 0346b1c6..5a7f6f73 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -200,7 +200,7 @@ int main( int argc, char* argv[] ) { n_seconds = divres.rem; double perfs = timer.seconds() / grid.np_int[IDIR] / grid.np_int[JDIR] - / grid.np_int[KDIR] / Tint.GetNCycles(); + / grid.np_int[KDIR] / Tint.GetNCycles() * idfx::psize; idfx::cout << "Main: Reached t=" << data.t << std::endl; idfx::cout << "Main: Completed in "; diff --git a/src/timeIntegrator.cpp b/src/timeIntegrator.cpp index de9c9e9b..6ffb6acf 100644 --- a/src/timeIntegrator.cpp +++ b/src/timeIntegrator.cpp @@ -88,7 +88,7 @@ TimeIntegrator::TimeIntegrator(Input & input, DataBlock & data) { void TimeIntegrator::ShowLog(DataBlock &data) { if(isSilent) return; double rawperf = (timer.seconds()-lastLog)/data.mygrid->np_int[IDIR]/data.mygrid->np_int[JDIR] - /data.mygrid->np_int[KDIR]/cyclePeriod; + /data.mygrid->np_int[KDIR]/cyclePeriod * idfx::psize; #ifdef WITH_MPI // measure time spent in expensive MPI calls double mpiCycleTime = idfx::mpiCallsTimer - lastMpiLog;