Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] To get reasonable performance on the default Triton passes pipeline. #897

Closed
wants to merge 19 commits into from

Conversation

chengjunlu
Copy link
Contributor

@chengjunlu chengjunlu commented Apr 17, 2024

Draft only for CI.

Added:

  • Intel rewriter tensor pointer pass.
  • Intel remove layout pass.
  • Intel accelerate matmul pass.
  • Intel materialize 2D load pass.
  • Intel loop pipelining with prefetching.
  • Use the intel specific passes pipeline.
  • Use large 2D load
  • Lower the prefetching op to 2D prefetching op.

TODO:

  • Benchmark.
  • Use the sub-group-size=32.
  • Use the 2D store
  • Add double GRF as a compile configuration.
  • Change the convert layout and emit index to dense stride for the dot operands layout and DPAS layout,

Memo: The sub-group-size=32 cause some DPAS UT fail.
Memo: Need to upstream the nested layout of the dot operand layout. Slice of dot layout. dot of dot layout.

@chengjunlu chengjunlu force-pushed the chengjun/llvm-target-dev branch 13 times, most recently from 759e4e1 to d3c482b Compare April 24, 2024 06:05
@chengjunlu chengjunlu force-pushed the chengjun/llvm-target-dev branch 4 times, most recently from 75ed1be to ec24ea5 Compare April 29, 2024 08:48
@chengjunlu chengjunlu force-pushed the chengjun/llvm-target-dev branch 4 times, most recently from 7b22c45 to 0c25d6a Compare May 8, 2024 08:33
@whitneywhtsang whitneywhtsang marked this pull request as draft May 11, 2024 04:17
@chengjunlu chengjunlu force-pushed the chengjun/llvm-target-dev branch 2 times, most recently from 22daa3f to bd400b0 Compare May 15, 2024 06:19
@chengjunlu chengjunlu closed this May 15, 2024
… could be supported by Intel GPU hardware 2D memory accessing. To protect the block pointer from being re-write in RewriteTensorPointer pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants