Comparative analysis of possible parallel implementations of Conway's famous Game of Life (GoL) using both GPU-based toolkits, CUDA, and CPU-based toolkits, OpenMP and MPI, on INFN's Ocapie cluster for HPC.
Authors: F. Minutoli, M. Ghirardelli, and D. Surpanu.
- Parallel Programming Illustrated through Conway's Game of Life
- Parallelization: Conway's Game of Life
- BWPEP on Conway's Game of Life
- A Performance Analysis of GoL
- What is a Dwarf in HPC?
-
The borderline size to distinguish a small GoL's grid from a big one has been set to 50x50 due to visualization constraints that would hinder visibility. Indeed any grid larger than that cannot be properly visualized in a terminal and its evolution would not be appreciated in its entirety.
-
Any custom C guard implemented to force some specific behaviour in the code is marked with a
GoL_
prefix.
Both the input and output file format comply with the full-matrix format (FM), that is:
-
A single header row comprising the # of rows and columns in the grid, as two space-separated numbers.
-
A line for each row in the grid comprising all of its values, expressed as an X character for ALIVE cells and an empty space (or non-X character) for DEAD cells.
Sample input files can be found in the example
folder, but one is given here, as well:
4 4
X X
XX
X X
Please note: In case the (0, 0) cell is DEAD, thus the file starts with an empty space, replace its character with any non-X character (i.e., A) of choice before reading the GoL matrix from file. This prevents a well-known buggy behaviour of the getline()
function in C from happening, due to which leading whitespaces are skipped.
This repository contains both the source code for a GPU-based implementation of Conway's Game of Life, inside the src\gpu
folder and for a CPU-based implementation, inside the src\cpu
folder. The include
folder, instead, contains header files that both implementations utilize interchangeably, i.e., the base structs life_t
and chunk_t
, with a few specific C guards whenever the functionalities have to differ.
The bin
folder contains various binaries generated by both implementations via the make
command, each of which is characterized by specific tags in its name that describe how it was compiled; hence, its scope:
vec
, stands for binaries optimized with vectorization at compile time;omp
, stands for binaries in which OpenMP support has been enabled;mpi
, stands for binaries in which MPI support has been enabled, thus they should be launched following standard MPI commands format, i.e.,mpirun
ormpiexec
;hybrid
, stands for binaries in which a hybrid MPI+OpenMP support has been enabled;cuda
, stands for binaries that should be run on a GPU-capable machine.
Last but not least, the experiment
folder contains all the experiments that we ran both implementations through.
Despite the repo containing both CPU and GPU code it has to be said that in order for the whole code to run, it needs to be shipped on a GPU-capable machine with OpenMP and MPI support. Otherwise, specific machines that provide either CPU or GPU capabilities should be implied to test both worlds separately.
Run any binary with the -h
flag to learn its expected usage.