GPU

Enabling GPU

To enable GPU:

Edit the Makefile and recompile the code (see Installation for details)
1. Turn on GPU
2. Set GPU_ARCH according to the GPU architecture on your system
3. Recompile the code by make clean; make
Query the GPUs on your system [optional]
Set and Validate GPU IDs

Compilation Options

Related options: GPU, GPU_ARCH

Runtime Parameters

Parameters described on this page: OPT__GPUID_SELECT, FLU_GPU_NPGROUP, POT_GPU_NPGROUP, CHE_GPU_NPGROUP, NSTREAM

Other related parameters: none

Parameters below are shown in the format: Name (Valid Values) [Default Value]

OPT__GPUID_SELECT (-2=CUDA, -1=MPI rank, ≥0=input) [-1]
- Description: See Set and Validate GPU IDs.
- Restriction: Must be smaller than the total number of GPUs in a node.

FLU_GPU_NPGROUP (>0; ≤0 → set to default) [depend on the GPU spec]
- Description: Number of patch groups updated by the GPU/CPU fluid solvers at a single time. See also Performance Optimizations: GPU.
- Restriction: Must be a multiple of GPU_NSTREAM.

POT_GPU_NPGROUP (>0; ≤0 → set to default) [depend on the GPU spec]
- Description: Number of patch groups updated by the GPU/CPU Poisson solvers at a single time. See also Performance Optimizations: GPU.
- Restriction: Must be a multiple of GPU_NSTREAM.

CHE_GPU_NPGROUP (>0; ≤0 → set to default) [depend on the GPU spec]
- Description: Number of patch groups updated by the GPU/CPU GRACKLE solvers at a single time. See also Performance Optimizations: GPU. The GPU version is currently not supported.
- Restriction:

NSTREAM (>0; ≤0 → set to default) [depend on the GPU spec]
- Description: Number of CUDA streams for the asynchronous memory copy between CPU and GPU. See also Performance Optimizations: GPU.
- Restriction: See the restrictions on FLU_GPU_NPGROUP and POT_GPU_NPGROUP.

Remarks

Query GPUs

To query all GPUs in a node, use the command

> nvidiam-smi

Here is an example on a node with 2 Tesla K40m GPUs:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40m          Off  | 0000:05:00.0     Off |                    0 |
| N/A   28C    P0    72W / 235W |   1071MiB / 11439MiB |     30%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40m          Off  | 0000:42:00.0     Off |                    0 |
| N/A   26C    P0    75W / 235W |   1071MiB / 11439MiB |     36%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     35286    C   ./gamer                                       1067MiB |
|    1     35287    C   ./gamer                                       1067MiB |
+-----------------------------------------------------------------------------+

It shows that the CUDA device compute mode of both GPUs are set to Default (corresponding to cudaComputeModeDefault), and there are currently two running jobs using GPU ID 0 and 1, respectively.

Set and Validate GPU IDs

On a node with N_GPU, each GPU has a unique ID in the range 0 to N_GPU-1. GAMER uses the runtime parameter OPT__GPUID_SELECT to set the GPU ID associated with each MPI process.

OPT__GPUID_SELECT = -2: set by CUDA runtime. Typically, this option should work together with the cudaComputeModeExclusive CUDA device compute mode, by which different MPI ranks in the same node will be assigned with different GPUs automatically. Otherwise, all MPI ranks will use GPU 0, which is likely undesirable. The cudaComputeModeExclusive compute mode can be set by nvidia-smi -c 1, which requires root privileges.
OPT__GPUID_SELECT = -1: set by MPI ranks. Specifically, it will set GPU ID to MPI_Rank % N_GPU, where % is the integer modulus operator. This is the recommended method when running on a system with multiple GPUs on each node. However, one must be careful about the order of MPI ranks among different nodes to ensure full utilization of all GPUs. For example, if you have two MPI ranks with MPI_Rank=0 and 2 running a node with N_GPU=2, both ranks will access GPU 0 (since both 0%2 and 2%2 are equal to 0) and GPU 1 will become idle, which is undesirable. One straightforward approach is to adopt a "SMP-style" rank ordering, by which ranks are placed consecutively until the node is filled up, then on to the next node. More detailed illustration can be found in the Blue Waters User Guide. Please also consult your system documentation.
OPT__GPUID_SELECT >= 0: simply set GPU ID to OPT__GPUID_SELECT. Valid inputs are 0 to N_GPU-1.

See also Hybrid MPI/OpenMP/GPU.

To validate the ID and configuration of the GPU adopted by each MPI process, search for the keyword "Device Diagnosis" in the log file Record__Note generated during the initialization of GAMER. You should see something like

Device Diagnosis
***********************************************************************************
MPI_Rank =   0, hostname =   golub123, PID = 47842

CPU Info :
CPU Type        : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPU MHz         : 2499.982
Cache Size      : 25600 KB
CPU Cores       : 10
Total Memory    : 63.0 GB

GPU Info :
Number of GPUs                    : 2
GPU ID                            : 0
GPU Name                          : Tesla K40m
CUDA Driver Version               : 8.0
CUDA Runtime Version              : 7.0
CUDA Major Revision Number        : 3
CUDA Minor Revision Number        : 5
Clock Rate                        : 0.745000 GHz
Global Memory Size                : 11439 MB
Constant Memory Size              : 64 KB
Shared Memory Size per Block      : 48 KB
Number of Registers per Block     : 65536
Warp Size                         : 32
Number of Multiprocessors:        : 15
Number of Cores per Multiprocessor: 192
Total Number of Cores:            : 2880
Max Number of Threads per Block   : 1024
Max Size of the Block X-Dimension : 1024
Max Size of the Grid X-Dimension  : 2147483647
Concurrent Copy and Execution     : Yes
Concurrent Up/Downstream Copies   : Yes
Concurrent Kernel Execution       : Yes
GPU has ECC Support Enabled       : Yes
***********************************************************************************

This example shows that the MPI rank 0 is using GPU 0 on the node "golub123", which has 2 GPUs in total.

Links

Home

Getting Started

User Guide

Advanced Topics

Developer Guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU

Enabling GPU

Compilation Options

Runtime Parameters

`OPT__GPUID_SELECT` (-2=CUDA, -1=MPI rank, ≥0=input) [-1]

`FLU_GPU_NPGROUP` (>0; ≤0 → set to default) [depend on the GPU spec]

`POT_GPU_NPGROUP` (>0; ≤0 → set to default) [depend on the GPU spec]

`CHE_GPU_NPGROUP` (>0; ≤0 → set to default) [depend on the GPU spec]

`NSTREAM` (>0; ≤0 → set to default) [depend on the GPU spec]

Remarks

Query GPUs

Set and Validate GPU IDs

Links

Clone this wiki locally

GPU

Enabling GPU

Compilation Options

Runtime Parameters

OPT__GPUID_SELECT (-2=CUDA, -1=MPI rank, ≥0=input) [-1]

FLU_GPU_NPGROUP (>0; ≤0 → set to default) [depend on the GPU spec]

POT_GPU_NPGROUP (>0; ≤0 → set to default) [depend on the GPU spec]

CHE_GPU_NPGROUP (>0; ≤0 → set to default) [depend on the GPU spec]

NSTREAM (>0; ≤0 → set to default) [depend on the GPU spec]

Remarks

Query GPUs

Set and Validate GPU IDs

Links

Clone this wiki locally

`OPT__GPUID_SELECT` (-2=CUDA, -1=MPI rank, ≥0=input) [-1]

`FLU_GPU_NPGROUP` (>0; ≤0 → set to default) [depend on the GPU spec]

`POT_GPU_NPGROUP` (>0; ≤0 → set to default) [depend on the GPU spec]

`CHE_GPU_NPGROUP` (>0; ≤0 → set to default) [depend on the GPU spec]

`NSTREAM` (>0; ≤0 → set to default) [depend on the GPU spec]