how to understand "triton_gpu.slice" #28

Shaquille-Wu · 2024-12-09T14:59:00Z

I cannot understand the meaning of triton_gpu.slice just from the explanation in "SliceEncodingAttr", this explanation as following:

  let description = [{
    TODO: improve docs

    A = [x  x  x  x  x  x  x  x]

    parent = [0  1  2  3 ]
             [4  5  6  7 ]
             [8  9  10 11]
             [12 13 14 15]
    dim = 0

    Then the data of A would be distributed as follow between the 16 CUDA threads:
    L(A) = [ {0,4,8,12} , {1,5,9,13} , ... {3,7,11,15}, {0,4,8,12} , ..., {3,7,11,15} ]

    This is useful for constructing the inverse layout of an expand_dims operation during some optimization passes.

  }];

now, let's show a code about triton_gpu.slice in the .mlir，like this:

#blocked = #triton_gpu.blocked<{sizePerThread = [4, 1], threadsPerWarp = [16, 2], warpsPerCTA = [1, 4], order = [0, 1]}>
#blocked1 = #triton_gpu.blocked<{sizePerThread = [1, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>
module attributes {"triton_gpu.num-ctas" = 1 : i32, "triton_gpu.num-warps" = 4 : i32} {
  tt.func @transpose(%arg0: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32},
                     %arg1: i32 {tt.divisibility = 16 : i32},
                     %arg2: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32},
                     %arg3: i32 {tt.divisibility = 16 : i32}) {
    %cst = arith.constant dense<true> : tensor<64x64xi1, #blocked>
    %cst_0 = arith.constant dense<0.000000e+00> : tensor<64x64xf32, #blocked1>
    %cst_1 = arith.constant dense<true> : tensor<64x64xi1, #blocked1>
    %0 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 1, parent = #blocked1}>>
    %1 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 1, parent = #blocked}>>
    %2 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 0, parent = #blocked1}>>
    %3 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 0, parent = #blocked}>>
......
......
......
}

I cannot figure out the layout about "%0", "%1", "%2", and "%3"

Is there anyone would like to teach me?
Is there anyone would like to tell me tensor layout about above values?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to understand "triton_gpu.slice" #28

how to understand "triton_gpu.slice" #28

Shaquille-Wu commented Dec 9, 2024 •

edited

Loading

how to understand "triton_gpu.slice" #28

how to understand "triton_gpu.slice" #28

Comments

Shaquille-Wu commented Dec 9, 2024 • edited Loading

Shaquille-Wu commented Dec 9, 2024 •

edited

Loading