Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup CUDA, Reuse Memory, Add Serial Model, Cleaup Std Parallelism #202

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from

Conversation

gonzalobg
Copy link
Contributor

@gonzalobg gonzalobg commented Jun 3, 2024

Cleanup CUDA

  • Refactor all kernels into a generic "parallel for" algorithm that supports grid-stride and block-stride loops, configurable with model flag.
  • Use Occupancy APIs to portably handle devices of all sizes.
  • Refactor CUDA memory allocation APIs.
  • Prints more GPU details, in particular, the theoretical peak BW in GB/s of the current device, using the NVML library (which is part of the CUDA Toolkit and always available)
  • Fixes 2 bugs:
    • Prints the "order" used to run the benchmarks (e.g. classic vs isolated)
    • Fixes a division by zero bug in the solution checking

Add Serial

By @tom91136 Good thing to have when comparing with other parallel programming models, mostly for syntax.
This also makes us consistent with CloverLeaf, TeaLeaf, and miniBUDE.

Reuse Memory

This PR puts benchmarks in control of allocating the host
memory used for verifying the results.

This enables benchmarks that use Unified Memory for the device
allocations, to avoid the host-side allocation and just pass
pointers to the device allocation to the benchmark driver.

Closes #128 .

Cleanup C++ Standard Parallelism

Merge the 3 implementations into one with different flags for data c++17, data c++23, and indices.
Also annotate workarounds with a #define WORKAROUND and print whether the current implementation is not conforming.
Adds support for AdaptiveCpp (CI not added yet; will be done later as part of removing hipSYCL).

@gonzalobg gonzalobg force-pushed the reuse_memory branch 7 times, most recently from 2b9129e to 6c83420 Compare June 4, 2024 16:31
This commit puts benchmarks in control of allocating the host
memory used for verifying the results.

This enables benchmarks that use Unified Memory for the device
allocations, to avoid the host-side allocation and just pass
pointers to the device allocation to the benchmark driver.

Closes UoB-HPC#128 .
@gonzalobg gonzalobg mentioned this pull request Jun 5, 2024
@gonzalobg gonzalobg changed the title Reuse memory Cleanup CUDA, Reuse Memory, Add Serial Model Jun 5, 2024
@gonzalobg gonzalobg changed the title Cleanup CUDA, Reuse Memory, Add Serial Model Cleanup CUDA, Reuse Memory, Add Serial Model, Cleaup Std Parallelism Jun 5, 2024
#ifdef INDICES
// NVHPC workaround: TODO: remove this eventually
#if defined(__NVCOMPILER) && defined(_NVHPC_STDPAR_GPU)
#define WORKAROUND
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a pragma message to print that workarounds are enabled.

#else

// auto exe_policy = dpl::execution::seq;
// auto exe_policy = dpl::execution::par;
static constexpr auto exe_policy = dpl::execution::par_unseq;
#define USE_STD_PTR_ALLOC_DEALLOC
#define WORKAROUND
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pragma message to highlight that there is a workaround

@@ -1,5 +1,5 @@

// Copyright (c) 2015-23 Tom Deakin, Simon McIntosh-Smith, and Tom Lin
// Copyright (c) 2015-16 Tom Deakin, Simon McIntosh-Smith,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants