rakau is a C++17 library for the computation of accelerations and potentials in gravitational N-body simulations.
The core of the library is a high-performance implementation of the Barnes-Hut tree algorithm, capable of taking advantage of modern heterogeneous hardware architectures. Specifically, rakau can run on:
- multicore CPUs, where it takes advantage of both multithreading and SIMD instructions,
- AMD GPUs, via ROCm,
- Nvidia GPUs, via CUDA.
On CPUs, multithreaded parallelism is implemented on top of the TBB library. Vectorisation is achieved via the xsimd intrinsics wrapper library (which means that, e.g., on x86-64 CPUs rakau can use all the available vector extensions, from SSE2 to AVX-512).
At the present time, rakau can accelerate on GPUs the tree traversal part of the Barnes-Hut algorithm. During tree traversal, the CPU and the GPU can be used concurrently, and the user is free to choose (depending on the available hardware) how to split the computation between CPU and GPU.
Work is ongoing to accelerate on GPUs other parts of the Barnes-Hut algorithm (most notably, tree construction).
The following table lists the runtime for the computation of the gravitational accelerations (i.e., the tree traversal part of the Barnes-Hut algorithm) in a system of 4 million particles distributed according to the Plummer model. Various hardware configurations are tested. The theta parameter is set to 0.75, the computation is done in single precision.
Hardware | Type | Compiler | Runtime |
---|---|---|---|
2 x Intel Xeon Gold 6148 | CPU (AVX-512, 40 cores + SMT) | GCC 8 | 82 ms |
2 x Intel Xeon E5-2698 | CPU (AVX2, 40 cores) | GCC 7 | 133 ms |
AMD Ryzen 1700 | CPU (AVX2, 8 cores + SMT) | GCC 8 | 580 ms |
AMD Ryzen 1700 | CPU (AVX2, 8 cores + SMT) | HCC 1.9 | 688 ms |
AMD Radeon RX 570 | GPU (Polaris) | HCC 1.9 | 256 ms |
Ryzen 1700 + RX 570 | CPU+GPU | HCC 1.9 | 195 ms |
Nvidia GeForce GTX 1080 Ti | GPU (Pascal) | NVCC | 140 ms |
Nvidia V100 | GPU (Volta) | NVCC | 95 ms |
Intel Core i7-3610QM | CPU (AVX, 4 cores + SMT) | GCC 8 | 1510 ms |
Nvidia GeForce GT 650M | GPU (Kepler) | NVCC | 2818 ms |
i7-3610QM + GT 650M | CPU+GPU | GCC 8 + NVCC | 1080 ms |
Current:
- single and double precision1,
- 2D and 3D2,
- computation of accelerations and/or potentials,
- support for multiple MACs (multipole acceptance criteria),
- highly configurable tree structure,
- ergonomic API based on modern C++ idioms.
Planned:
- higher multipole moments,
- support for integration schemes based on hierarchical timesteps,
- better support for multi-GPU setups3,
- Python interface.
1long double
is supported as well,
but it is available only on the CPU and there's no SIMD support for extended precision
on any architecture at this time.
2The 2D CPU codepaths have not beem SIMDified yet.
3Multi-GPU support is available on CUDA (and potentially ROCm, if I can get my hands on a multi-GPU ROCm machine), but it currently exhibits poor scaling properties.
rakau has the following mandatory dependencies:
- the TBB library,
- the xsimd library,
- the Boost libraries (the header-only parts are sufficient, apart from the benchmark suite which needs the compiled Boost.Program_options library).
In order to run on AMD GPUs, rakau must be compiled with the HCC compiler from the ROCm toolchain. Support for Nvidia GPUs requires the CUDA software stack.
rakau is written in C++17, thus a reasonably recent (and conforming) C++ compiler is required. GCC 7/8 and clang 6/7 are the main compilers used during development.
rakau uses the CMake build system. The main configuration variables are:
RAKAU_BUILD_BENCHMARKS
: build the benchmark suite,RAKAU_BUILD_TESTS
: build the test suite,RAKAU_WITH_ROCM
: enable support for AMD GPUs via ROCm,RAKAU_WITH_CUDA
: enable support for Nvidia GPUs via CUDA.
If no GPU support is enabled, rakau is a header-only library. If support for AMD or Nvidia GPUs is enabled, a dynamic library will be built and installed in addition to the header files.
rakau's build system installs a CMake config-file package which allows to easily find and use rakau from other CMake-based projects. A minimal example:
# Locate rakau on the system.
find_package(rakau)
# Link rakau (and its dependencies) to an executable.
target_link_libraries(my_executable rakau::rakau)
rakau's usual workflow involves two basic operations:
- the construction of the tree structure from a distribution of particles in space,
- the traversal of the tree structure for the computation of the gravitational accelerations/potentials on the particles.
A minimal example:
#include <array>
#include <initializer_list>
#include <vector>
#include <rakau/tree.hpp>
using namespace rakau;
using namespace rakau::kwargs;
int main()
{
// Create an octree from a set of particle coordinates and masses.
octree<float> t{x_coords = {1, 2, 3}, y_coords = {4, 5, 6}, z_coords = {7, 8, 9}, masses = {1, 1, 1}};
// Prepare output vectors for the accelerations.
std::array<std::vector<float>, 3> accs;
// Compute the accelerations with a theta parameter of 0.4.
t.accs_u(accs, 0.4f);
}
More examples and details are available in the user guide (TODO).
The development of rakau was in part supported by the German Deutsche Forschungsgemeinschaft (DFG) priority program 1833, "Building a Habitable Earth".