-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] A@B.T
- The GEMM performance with the column major B matrix is not as good as row major B matrix.
#2354
Comments
I think this issue is essential for GEMM perf. Very often weights are stored with K dimensions as the last. Even pytorch linear layer does that: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html |
Adding to this, if the A matrix is column-major we have similar problems. |
We now have microbenchmarks to track this performance. Currently GeoMean for
So onednn is 1.5 times faster for |
@alexbaden Should we change the title to reflect issue with A.T as well or create separate issue for that case? |
A@B.T
- The GEMM performance with the column major B matrix is not as good as row major B matrix.
Current Triton tiling strategy for DPAS for oneDNN tiling strategy mapped to Triton (thanks to @Jianhui-Li and @chengjunlu ) : I plan to try to implement the oneDNN strategy in Triton. |
…2956) Required for #2834 Two reasons to do this - one, it properly tags the layouts with their memory order very early in the TTGIR pipeline. And two, it moves our TTGIR pipeline closer to upstream. I am splitting the change to isolate any regressions or undesired behavior caused by this change vs changing the DPAS layouts in #2834. cc #2354
The performance gap is found in #2347
Need to investigate root cause of the performance drops of the column major B matrix case.
Roughly 1.5x worse than the row major B matrix case.
The text was updated successfully, but these errors were encountered: