Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Radeon support by controlling wave size #1729

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

seanofthemillers
Copy link

@seanofthemillers seanofthemillers commented Sep 3, 2024

Summary

This is a feature. It adds a configuration time control parameter for the default wave size. On AMD MI cards this is generally 64, however on Radeon (gaming) cards this is usually 32. These changes will require the user to know if the card is setup for Wave32 or Wave64.

I also added a fix for the dynamically sized memory allocation which seems to trigger a lot of warnings for ROCm 6.2.

Design review (for API changes or additions---delete if unneeded)

On (date), we reviewed this PR. We discussed the design ideas:

  1. First idea or goal
  2. Second idea
  3. Third idea

This PR implements 1. and 3. It leaves out 2. for the following reasons

  • (impractical)
  • (too big)
  • (not a good idea anyway)

@seanofthemillers seanofthemillers force-pushed the feature/seanofthemillers/adding_radeon_support branch from 480d181 to 95de1dc Compare September 3, 2024 15:23
@@ -780,8 +780,8 @@ namespace expt

// Third: mask off everything but output_segment
// this is because all output segments are valid at this point
// (5-segbits), the 5 is since the warp-width is 32 == 1<<5
int our_output_segment = get_lane()>>(6-segbits);
const int log2_warp_size = 32-1-__builtin_clz(warpSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using warpSize it appear that this is making a runtime value now instead of a compile time value, do we need to worry about the performance implications of this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will add a few instructions and possibly a sgpr load. It probably won't be noticeable, but we can move the log2_warp_size as a constexpr member of the DeviceConstants. Or just wrap these two places in ifdefs. If we keep it dynamic then it should work for Nvidia/other as well - I'm not sure how important that is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're already relying on getting the right warpSize at configure time I think we should keep this compile time if possible. There is a constexpr RAJA::log2 that we can use instead of clz here to keep it constexpr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants