-
Notifications
You must be signed in to change notification settings - Fork 59
GPU
To enable GPU:
-
Edit the
Makefile
and recompile the code (see Installation for details) -
Query the GPUs on your system [optional]
Related options: GPU, GPU_ARCH
Parameters described on this page: OPT__GPUID_SELECT, FLU_GPU_NPGROUP, POT_GPU_NPGROUP, CHE_GPU_NPGROUP, NSTREAM
Other related parameters: none
Parameters below are shown in the format: Name
(Valid Values) [Default Value]
-
- Description: See Set and Validate GPU IDs.
- Restriction: Must be smaller than the total number of GPUs in a node.
-
- Description: Number of patch groups updated by the GPU/CPU fluid solvers at a single time. See also Performance Optimizations: GPU.
- Restriction: Must be a multiple of GPU_NSTREAM.
-
- Description: Number of patch groups updated by the GPU/CPU Poisson solvers at a single time. See also Performance Optimizations: GPU.
- Restriction: Must be a multiple of GPU_NSTREAM.
-
- Description: Number of patch groups updated by the GPU/CPU GRACKLE solvers at a single time. See also Performance Optimizations: GPU. The GPU version is currently not supported.
- Restriction:
-
- Description: Number of CUDA streams for the asynchronous memory copy between CPU and GPU. See also Performance Optimizations: GPU.
- Restriction: See the restrictions on FLU_GPU_NPGROUP and POT_GPU_NPGROUP.
To query all GPUs in a node, use the command
> nvidiam-smi
Here is an example on a node with 2 Tesla K40m GPUs:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40m Off | 0000:05:00.0 Off | 0 |
| N/A 28C P0 72W / 235W | 1071MiB / 11439MiB | 30% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 0000:42:00.0 Off | 0 |
| N/A 26C P0 75W / 235W | 1071MiB / 11439MiB | 36% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 35286 C ./gamer 1067MiB |
| 1 35287 C ./gamer 1067MiB |
+-----------------------------------------------------------------------------+
It shows that the
CUDA device compute mode
of both GPUs are set to Default
(corresponding to cudaComputeModeDefault
),
and there are currently two running jobs using GPU ID 0 and 1, respectively.
On a node with NGPU, each GPU has a unique ID in the range 0 to NGPU-1. GAMER uses the runtime parameter OPT__GPUID_SELECT to set the GPU ID associated with each MPI process.
-
OPT__GPUID_SELECT = -2
: set by CUDA runtime. Typically, this option should work together with thecudaComputeModeExclusive
CUDA device compute mode, by which different MPI ranks in the same node will be assigned with different GPUs automatically. Otherwise, all MPI ranks will use GPU 0, which is likely undesirable. ThecudaComputeModeExclusive
compute mode can be set bynvidia-smi -c 1
, which requires root privileges. -
OPT__GPUID_SELECT = -1
: set by MPI ranks. Specifically, it will set GPU ID to MPI_Rank % NGPU, where % is the integer modulus operator. This is the recommended method when running on a system with multiple GPUs on each node. However, one must be careful about the order of MPI ranks among different nodes to ensure full utilization of all GPUs. For example, if you have two MPI ranks with MPI_Rank=0 and 2 running a node with NGPU=2, both ranks will access GPU 0 (since both 0%2 and 2%2 are equal to 0) and GPU 1 will become idle, which is undesirable. One straightforward approach is to adopt a "SMP-style" rank ordering, by which ranks are placed consecutively until the node is filled up, then on to the next node. More detailed illustration can be found in the Blue Waters User Guide. Please also consult your system documentation. -
OPT__GPUID_SELECT >= 0
: simply set GPU ID toOPT__GPUID_SELECT
. Valid inputs are 0 to NGPU-1.
See also Hybrid MPI/OpenMP/GPU.
To validate the ID and configuration of the GPU adopted by each
MPI process, search for the keyword "Device Diagnosis" in the log file
Record__Note
generated during the initialization of GAMER. You should
see something like
Device Diagnosis
***********************************************************************************
MPI_Rank = 0, hostname = golub123, PID = 47842
CPU Info :
CPU Type : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPU MHz : 2499.982
Cache Size : 25600 KB
CPU Cores : 10
Total Memory : 63.0 GB
GPU Info :
Number of GPUs : 2
GPU ID : 0
GPU Name : Tesla K40m
CUDA Driver Version : 8.0
CUDA Runtime Version : 7.0
CUDA Major Revision Number : 3
CUDA Minor Revision Number : 5
Clock Rate : 0.745000 GHz
Global Memory Size : 11439 MB
Constant Memory Size : 64 KB
Shared Memory Size per Block : 48 KB
Number of Registers per Block : 65536
Warp Size : 32
Number of Multiprocessors: : 15
Number of Cores per Multiprocessor: 192
Total Number of Cores: : 2880
Max Number of Threads per Block : 1024
Max Size of the Block X-Dimension : 1024
Max Size of the Grid X-Dimension : 2147483647
Concurrent Copy and Execution : Yes
Concurrent Up/Downstream Copies : Yes
Concurrent Kernel Execution : Yes
GPU has ECC Support Enabled : Yes
***********************************************************************************
This example shows that the MPI rank 0 is using GPU 0 on the node "golub123", which has 2 GPUs in total.
Getting Started
User Guide
- Installation
- Running the Code
- Adding New Simulations
- Runtime Parameters
- MPI and OpenMP
- GPU
- Physics Modules
- Outputs
- Simulation Logs
- Data Analysis
- In Situ Python Analysis
- Test Problems
- Troubleshooting
Advanced Topics
Developer Guide