-
Notifications
You must be signed in to change notification settings - Fork 59
GPU
Hsi-Yu Schive edited this page Dec 19, 2017
·
14 revisions
-
- Description:
- Restriction:
-
- Description:
- Restriction:
-
- Description:
- Restriction:
-
- Description:
- Restriction:
-
- Description:
- Restriction:
On a node with NGPU, each GPU has a unique ID in the range 0 to NGPU-1. GAMER uses the runtime parameter OPT__GPUID_SELECT to set the GPU ID associated with each MPI process.
-
OPT__GPUID_SELECT=-2
: set by CUDA runtime automatically. In most cases, it will set GPU ID to 0 unless the CUDA device compute mode is not set to 0 (cudaComputeModeDefault
). -
OPT__GPUID_SELECT=-1
: set by MPI ranks. Specifically, it will set GPU ID to MPI_Rank % NGPU, where % is the integer modulus operator. This is the recommended method when running on a system with multiple GPUs on each node. However, one must be careful about the order of MPI ranks among different nodes to ensure full utilization of all GPUs on each node. For example, if you have two MPI ranks with MPI_Rank=0 and 2 running a node with NGPU=2, the two ranks will both access GPU 0 (since both 0%2 and 2%2 are equal 0) and GPU 1 will become idle, which is undesirable. One straightforward approach is to adopt a "SMP-style" rank ordering, by which ranks are placed consecutively until the node is filled up, then on to the next node. Please consult your system documentation.
https://bluewaters.ncsa.illinois.edu/topology-considerations
3. Validate the GPU settings by searching for the keyword "Device Diagnosis"
in the log file Record__Note
. You should see something like
Device Diagnosis
***********************************************************************************
MPI_Rank = 0, hostname = golub123, PID = 47842
CPU Info :
CPU Type : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPU MHz : 2499.982
Cache Size : 25600 KB
CPU Cores : 10
Total Memory : 63.0 GB
GPU Info :
Number of GPUs : 2
GPU ID : 0
GPU Name : Tesla K40m
CUDA Driver Version : 8.0
CUDA Runtime Version : 7.0
CUDA Major Revision Number : 3
CUDA Minor Revision Number : 5
Clock Rate : 0.745000 GHz
Global Memory Size : 11439 MB
Constant Memory Size : 64 KB
Shared Memory Size per Block : 48 KB
Number of Registers per Block : 65536
Warp Size : 32
Number of Multiprocessors: : 15
Number of Cores per Multiprocessor: 192
Total Number of Cores: : 2880
Max Number of Threads per Block : 1024
Max Size of the Block X-Dimension : 1024
Max Size of the Grid X-Dimension : 2147483647
Concurrent Copy and Execution : Yes
Concurrent Up/Downstream Copies : Yes
Concurrent Kernel Execution : Yes
GPU has ECC Support Enabled : Yes
***********************************************************************************
This example shows that we are running on the computing node "golub123" with 2 GPUs, and we are using the one with "GPU ID = 0", a "Tesla K40m" GPU.
nvidia-smi -c (require root)
Getting Started
User Guide
- Installation
- Running the Code
- Adding New Simulations
- Runtime Parameters
- MPI and OpenMP
- GPU
- Physics Modules
- Outputs
- Simulation Logs
- Data Analysis
- In Situ Python Analysis
- Test Problems
- Troubleshooting
Advanced Topics
Developer Guide