Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the block size in gpu-latency is 64 #10

Open
yjl0101 opened this issue Aug 29, 2024 · 2 comments
Open

Why the block size in gpu-latency is 64 #10

yjl0101 opened this issue Aug 29, 2024 · 2 comments

Comments

@yjl0101
Copy link

yjl0101 commented Aug 29, 2024

Hi, I notice that the block size in gpu-latency is 64 rather than 32 (warp size) or a single thread. Is there any consideration to set as 64? Looking forward to your reply:)

@te42kyfo
Copy link
Owner

A single thread would work just the same. Using a full warp just feels better, and a full warp is 64 threads on CDNA hardware. On NVIDIA, this actually runs two warps instead of just one, but from what I remember, the interference is low.

So no particularly strong reasoning.

@yjl0101
Copy link
Author

yjl0101 commented Aug 29, 2024

@te42kyfo aha, 64 for AMD gpus, got it and thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@te42kyfo @yjl0101 and others