Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to understand "triton_gpu.slice" #28

Open
Shaquille-Wu opened this issue Dec 9, 2024 · 0 comments
Open

how to understand "triton_gpu.slice" #28

Shaquille-Wu opened this issue Dec 9, 2024 · 0 comments

Comments

@Shaquille-Wu
Copy link

Shaquille-Wu commented Dec 9, 2024

I cannot understand the meaning of triton_gpu.slice just from the explanation in "SliceEncodingAttr", this explanation as following:

  let description = [{
    TODO: improve docs

    A = [x  x  x  x  x  x  x  x]

    parent = [0  1  2  3 ]
             [4  5  6  7 ]
             [8  9  10 11]
             [12 13 14 15]
    dim = 0

    Then the data of A would be distributed as follow between the 16 CUDA threads:
    L(A) = [ {0,4,8,12} , {1,5,9,13} , ... {3,7,11,15}, {0,4,8,12} , ..., {3,7,11,15} ]

    This is useful for constructing the inverse layout of an expand_dims operation during some optimization passes.

  }];

now, let's show a code about triton_gpu.slice in the .mlir,like this:

#blocked = #triton_gpu.blocked<{sizePerThread = [4, 1], threadsPerWarp = [16, 2], warpsPerCTA = [1, 4], order = [0, 1]}>
#blocked1 = #triton_gpu.blocked<{sizePerThread = [1, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>
module attributes {"triton_gpu.num-ctas" = 1 : i32, "triton_gpu.num-warps" = 4 : i32} {
  tt.func @transpose(%arg0: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32},
                     %arg1: i32 {tt.divisibility = 16 : i32},
                     %arg2: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32},
                     %arg3: i32 {tt.divisibility = 16 : i32}) {
    %cst = arith.constant dense<true> : tensor<64x64xi1, #blocked>
    %cst_0 = arith.constant dense<0.000000e+00> : tensor<64x64xf32, #blocked1>
    %cst_1 = arith.constant dense<true> : tensor<64x64xi1, #blocked1>
    %0 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 1, parent = #blocked1}>>
    %1 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 1, parent = #blocked}>>
    %2 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 0, parent = #blocked1}>>
    %3 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 0, parent = #blocked}>>
......
......
......
}

I cannot figure out the layout about "%0", "%1", "%2", and "%3"

Is there anyone would like to teach me?
Is there anyone would like to tell me tensor layout about above values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant