Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix permute_multi_embedding kernel #3227

Closed
wants to merge 1 commit into from

Commits on Oct 5, 2024

  1. Fix permute_multi_embedding kernel

    Summary:
    X-link: facebookresearch/FBGEMM#325
    
    Looks like a typo to use `permute_id = threadIdx.y + blockIdx.x * blockDim.x` which should be `blockDim.y`. This doesn't affect Nvidia because blockDim.x and y are both 32 (32 threads per warp + 32 warps). For AMD GPU, blockDim.x is 64 and blockDim.y is 16, causing numerical issues.
    
    Reviewed By: leitian, jianyuh, joebos
    
    Differential Revision: D63936776
    xw285cornell authored and facebook-github-bot committed Oct 5, 2024
    Configuration menu
    Copy the full SHA
    657e566 View commit details
    Browse the repository at this point in the history