-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2D block load lowering for tt.dot operands with no intermediate op #941
Conversation
b8ba9b9
to
bbc1576
Compare
The RewriterTensorPointer.cpp is totally new in this PR. It is hard to review what is the change we made to it. Please create a commit to copy the TTIR pass rewriter tensor pointer to the TT Intel GPU IR repo first. And then based on that, we can review the customized changes for Intel clearly. |
Refine Intel RewriteTensorPtr pass Except store in rewrite tensorptr Refine Intel RewriteTensorPtr pass
358743e
to
03df366
Compare
03df366
to
9865fee
Compare
Rebased and adjusted the commit history, we can view this commit c01f7ac#diff-43a0aeab44c0c355cbd24ce57853a07b38d96b98d4d5b10a8a2e3dfbf121fdc4 to see the changes between Triton common |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a PR to copy the RewriteTensorPointer.cpp
file over to the intel directory (a NFC PR). Then rebase this one.
Based on offline discussion, I separated this pull request to 2 for the better reviewing:
I have marked this to draft for now and will close it when conversation under this PR are all resolved. |
This is the first PR separated from #941 This PR focuses on rewriting the `RewriteTensorPointer` pass, so we can allow `tt.load` with tensor pointer pattern in our compilation pipeline, rather than being rewriten to legacy load. --------- Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com> Co-authored-by: Whitney Tsang <whitney.tsang@intel.com> Co-authored-by: Tiotto, Ettore <ettore.tiotto@intel.com>
This is the second PR separated from #941 This PR focuses on lowering `tt.load` with tensor pointer to `Triyton::Matrix2DBlockLoad`. --------- Co-authored-by: Whitney Tsang <whitney.tsang@intel.com> Co-authored-by: Tiotto, Ettore <ettore.tiotto@intel.com>
This PR is for #146
Based on discussion in #865, this is another version of implementation that we do not rely on the intermediate op
triton_intel_gpu.load_2d
. Instead, we have to make changes toRewriteTensorPointer
pass.The strategy is to copy common
RewriteTensorPointer
pass to Intel GPU passes, do not rewritett.load
op with TensorPoiner, as what the previous NV pass did. So that we can have it in later stage and lower it tollvm.genx.GenISA.LSC2DBlockRead
.