Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D block load lowering for tt.dot operands with no intermediate op #941

Closed
wants to merge 7 commits into from

Conversation

LiyangLingIntel
Copy link
Contributor

@LiyangLingIntel LiyangLingIntel commented Apr 20, 2024

This PR is for #146
Based on discussion in #865, this is another version of implementation that we do not rely on the intermediate op triton_intel_gpu.load_2d. Instead, we have to make changes to RewriteTensorPointer pass.
The strategy is to copy common RewriteTensorPointer pass to Intel GPU passes, do not rewrite tt.load op with TensorPoiner, as what the previous NV pass did. So that we can have it in later stage and lower it to llvm.genx.GenISA.LSC2DBlockRead.

@chengjunlu
Copy link
Contributor

The RewriterTensorPointer.cpp is totally new in this PR. It is hard to review what is the change we made to it.

Please create a commit to copy the TTIR pass rewriter tensor pointer to the TT Intel GPU IR repo first. And then based on that, we can review the customized changes for Intel clearly.

@LiyangLingIntel LiyangLingIntel force-pushed the liyang/2dload-no-intermediate-op branch 2 times, most recently from 358743e to 03df366 Compare April 22, 2024 01:53
@LiyangLingIntel
Copy link
Contributor Author

The RewriterTensorPointer.cpp is totally new in this PR. It is hard to review what is the change we made to it.

Please create a commit to copy the TTIR pass rewriter tensor pointer to the TT Intel GPU IR repo first. And then based on that, we can review the customized changes for Intel clearly.

Rebased and adjusted the commit history, we can view this commit c01f7ac#diff-43a0aeab44c0c355cbd24ce57853a07b38d96b98d4d5b10a8a2e3dfbf121fdc4 to see the changes between Triton common RewriteTensorPointer pass and Intel RewriteTensorPointer pass.

Copy link
Contributor

@etiotto etiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a PR to copy the RewriteTensorPointer.cpp file over to the intel directory (a NFC PR). Then rebase this one.

@LiyangLingIntel
Copy link
Contributor Author

Based on offline discussion, I separated this pull request to 2 for the better reviewing:

I have marked this to draft for now and will close it when conversation under this PR are all resolved.

@LiyangLingIntel LiyangLingIntel deleted the liyang/2dload-no-intermediate-op branch April 24, 2024 13:47
etiotto added a commit that referenced this pull request Apr 29, 2024
This is the first PR separated from
#941
This PR focuses on rewriting the `RewriteTensorPointer` pass, so we can
allow `tt.load` with tensor pointer pattern in our compilation pipeline,
rather than being rewriten to legacy load.

---------

Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com>
Co-authored-by: Whitney Tsang <whitney.tsang@intel.com>
Co-authored-by: Tiotto, Ettore <ettore.tiotto@intel.com>
whitneywhtsang added a commit that referenced this pull request Apr 30, 2024
This is the second PR separated from
#941
This PR focuses on lowering `tt.load` with tensor pointer to
`Triyton::Matrix2DBlockLoad`.

---------

Co-authored-by: Whitney Tsang <whitney.tsang@intel.com>
Co-authored-by: Tiotto, Ettore <ettore.tiotto@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants