-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPAS]: Use 2d-loads instruction to load the operand of tt.dot
#146
Comments
Liyang, please check how cuda handle stride_xx. |
Triton CUDA pipeline lower 2d-load to For the first stage, in the pass of 2d load conversion lowering, we will check the For the second stage, we will consider the dynamic |
@LiyangLingIntel as per our discussion could you please split this ticket by stage |
Sure, I have add 2 issues(#413 and #415) to split this ticket as 2 stages. |
Helping with refactoring and code review. |
The operands of the Triton's
tt.dot
operation should be loaded by using specialized instruction to load 2D blocks of the matrices.Loading the operands in blocks is more efficient than loading them by using regular loads
@llvm.genx.GenISA.LSCPrefetch
.We might need to leverage the semantic information associated with Tritons blocked pointers (https://triton-lang.org/main/getting-started/tutorials/08-experimental-block-pointer.html) in order to generate 2d-Blocked loads.
The text was updated successfully, but these errors were encountered: