-
Notifications
You must be signed in to change notification settings - Fork 6
UK-MAC/CloverLeaf_CUDAFortran
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This directory contain a CUDA Fortran port of the serial Cloverleaf benchmark for a single GPU. Managed Memory -------------- This code utilizes the Managed Memory feature introduced in the CUDA 6.0 Toolkit and in the PGI 14.7 compilers. Managed Memory is a pool of memory accessible to both CPU and GPU using a single variable, which eliminates the need for separate host and device versions of data along with coding explicit data movement between host and device. In CUDA Fortran, allocation into this pool of memory is achieved using the "managed" variable attribute in declarations (see definitions.cuf). Managed Memory requires use of the 6.0 CUDA libraries and is available on devices with compute capability of 3.0 or higher. One caveat in using Managed Memory is that on multi-GPU systems where any two GPUs are NOT peer-to-peer capable, the system falls back to using zero-copy memory which will typically result in a large performance degradation. Since this code utilizes a single GPU, the zero-copy fallback can be avoided by setting the environment variable CUDA_VISIBLE_DEVICES to the appropriate GPU, or by setting the environment variable CUDA_MANAGED_FORCE_DEVICE_ALLOC to a non-zero value. (To determine the whether all pairs of GPUs on your system are peer-to-peer capable, compile and run the example code p2pAccess found in: /opt/pgi/*/2015/examples/CUDA-Fortran/CUDA-Fortran-Book/chapter4/P2P/p2pAccess/) Texture Cache ------------- On devices of compute capability 3.5 and higher, kernel arguments declared as "intent(in)" will be routed through the read-only texture cache via the LDG instruction. As a result of this feature, there are neither use of explicit textures nor shared memory in any kernels. On devices of compute capability 3.0, the code will run slower as the texture cache is not used. Test Cases ---------- Four different input files are included with the code are those recommended to test for correctness. Cloverleaf expects the input file to be named clover.in, so copy one of these files to clover.in before running a test. The kinetic energy, as displayed in the output file clover.out every 10 timesteps and at the end of the run, is usually the most sensitive state variable and can be used as a correctness test. The reference values for the kinetic energy for the last time step of these runs, along with approximate wall-clock times for a system with a K20 are listed below: Input Kinetic Energy Wall-clock time (s) clover_bm_short.in 0.1193E+01 1.13 clover_bm.in 0.2590E+01 39 clover_bm16_short.in 0.3075E+00 17 clover_bm16.in 0.4854E+01 580
About
CUDA Fortran port of CloverLeaf for 1 GPU
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published