HPC examples

This is a repository where I put simple example showcasing the basic capabilities of MPI, OPENMP and CUDA.

I choose to make a repository out of them because some of those capability aren't particularly well documented, and those simple examples might save you the time I took making them work properly.

UMA with CUDA and OPENMP

I demonstrate in uma_cuda_omp.cpp how you can allocate a buffer using Unified Memory Access (or UMA) using CUDA, and then access it from both the cpu and an OPENMP target section.

The important section is the following :

/*
We first allocate a uma buffer using cudaMallocManaged
And then make it a omp device pointer using omp_target_associate_ptr
*/
cudaMallocManaged((CUdeviceptr**)&table, size*sizeof(int), CU_MEM_ATTACH_GLOBAL);
omp_target_associate_ptr(table, table, size*sizeof(int), 0 /* device offset */ , gpuid);

To then free ptr you can use :

/* Free memory */
cudaFree(ptr);

To compile this example, you can use nvcc or using g++ with :

g++ uma_cuda_omp.cpp -fopenmp -foffload=nvptx-none -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcudart -fno-stack-protector -fcf-protection=none

Equivalent code using only OPNEMP

UMA is also possible using only OPNEMP as shown in omp_uma.cpp using the following code :

/* allocate a uma buffer using malloc (or other) */
#pragma omp extension unified_memory
int *table = (int*)malloc(size*sizeof(int));

table can then be freed as usual:

/* Free memory */
free(table);

Shared memory using MPI

mpi_shared_window.cpp shows an example of a shared window in MPI.

The important section is the following :

/*
We first allocate a shared window using MPI_Win_allocate_shared
Only rank 0 on a node actually allocates memory
We then get the actual shared table pointer using MPI_Win_shared_query
*/
MPI_Win_allocate_shared(noderank == 0 ? nodesize*sizeof(int) : 0,
  sizeof(int), MPI_INFO_NULL, nodecomm, &table, &wintable);
if (noderank != 0) MPI_Win_shared_query(wintable, 0, &winsize, &windisp, &table);

To then free the shared window you can use :

/* free the memory */
MPI_Win_free(&wintable);

You can compile this example using mpic++ and run it using mpirun.

RMA with MPI and UMA with CUDA and OPENMP

I demonstrate in mpi_rma_cuda_uma.cpp how you can allocate a buffer using Unified Memory Access (or UMA) using CUDA, and then access it from an other node using Remote Memory Access (or RMA) using MPI.

The important section is the following :

if (rank == 0) {
  /*
  We first allocate a uma buffer using cudaMallocManaged on the first rank
  And then make it a omp device pointer using omp_target_associate_ptr
  */
  cudaMallocManaged((CUdeviceptr**)&table, table_size*sizeof(int), CU_MEM_ATTACH_GLOBAL);
  omp_target_associate_ptr(table, table, table_size*sizeof(int), 0 /* device offset */ , gpuid);
} else {
  /* we do a normal alloc on the second node */
  table = (int*)malloc(table_size*sizeof(int));
}
/* We then allocate a window using MPI_Win_create */
MPI_Win_create(table, table_size*sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &wintable);

You can then use MPI_Get and MPI_Put (demonstrated in mpi_rma_cuda_uma.cpp) to remotely access table.

To then free both the window and both buffer, you can use:

/* free the window */
MPI_Win_free(&wintable);
if (rank == 0) {
  /* free the cuda buffer */
  cudaFree(table);
} else {
  /* free the normal buffer */
  free(table);
}

To compile this example, you can use mpic++ with :

mpic++ mpi_rma_cuda_uma.cpp -fopenmp -foffload=nvptx-none -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcudart -fno-stack-protector -fcf-protection=none

And you can run it using :

mpirun -np 2 a.out

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mpi_rma_cuda_uma.cpp		mpi_rma_cuda_uma.cpp
mpi_shared_window.cpp		mpi_shared_window.cpp
omp_uma.cpp		omp_uma.cpp
uma_cuda_omp.cpp		uma_cuda_omp.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HPC examples

UMA with CUDA and OPENMP

Equivalent code using only OPNEMP

Shared memory using MPI

RMA with MPI and UMA with CUDA and OPENMP

About

Releases

Packages

Languages

License

jolatechno/HPC-examples

Folders and files

Latest commit

History

Repository files navigation

HPC examples

UMA with CUDA and OPENMP

Equivalent code using only OPNEMP

Shared memory using MPI

RMA with MPI and UMA with CUDA and OPENMP

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages