Use stride instead of order to determine block attr #2349

alexbaden · 2024-09-25T20:39:51Z

Per the Triton slack, order is unused on architecture below Hopper. But more importantly, order provides information that stride already has. In fact, order can be completely different from stride (i.e. wrong) and we still generate correct code. I think it is better to use the stride assuming the logic I added here makes sense.

Note this depends on #2348, I'd like to land the debug logging separately, so we have it even if we decide to modify this approach. It was very useful in debugging this problem.

cc #2347

third_party/intel/lib/TritonIntelGPUTransforms/MaterializeBlockPointer.cpp

whitneywhtsang

can we add a lit test?

third_party/intel/lib/TritonIntelGPUTransforms/MaterializeBlockPointer.cpp

Co-authored-by: Whitney Tsang <whitney.tsang@intel.com>

alexbaden · 2024-09-26T01:01:05Z

can we add a lit test?

The existing tests actually cover this scenario - because they change both the order and the stride.

    // CHECK: tt.load {{.*}} {boundaryCheck = array<i32: 1>, padding = 1 : i32, triton_intel_gpu.block_io = "row_major"}
    // CHECK: tt.load {{.*}} {boundaryCheck = array<i32: 0>, padding = 1 : i32, triton_intel_gpu.block_io = "row_major"}
    %3 = tt.make_tensor_ptr %arg0, [%c0_i64, %c0_i64], [%pitch, %c1_i64], [%c0_i32, %c0_i32] {order = array<i32: 1, 0>} : <tensor<64x32xf16, #dot_a>>
    %4 = tt.make_tensor_ptr %arg0, [%c0_i64, %c0_i64], [%pitch, %c1_i64], [%c0_i32, %c0_i32] {order = array<i32: 1, 0>} : <tensor<32x64xf16, #dot_b>>
    %5 = tt.load %3 {boundaryCheck = array<i32: 1>, cache = 1 : i32, evict = 1 : i32, isVolatile = false, padding = 1 : i32} : !tt.ptr<tensor<64x32xf16, #dot_a>>
    %6 = tt.load %4 {boundaryCheck = array<i32: 0>, cache = 1 : i32, evict = 1 : i32, isVolatile = false, padding = 1 : i32} : !tt.ptr<tensor<32x64xf16, #dot_b>>

    // CHECK: tt.load {{.*}} {boundaryCheck = array<i32: 1>, padding = 1 : i32, triton_intel_gpu.block_io = "column_major"}
    // CHECK: tt.load {{.*}} {boundaryCheck = array<i32: 0>, padding = 1 : i32, triton_intel_gpu.block_io = "column_major"}
    %7 = tt.make_tensor_ptr %arg0, [%c0_i64, %c0_i64], [%c1_i64, %pitch], [%c0_i32, %c0_i32] {order = array<i32: 0, 1>} : <tensor<64x32xf16, #dot_a>>
    %8 = tt.make_tensor_ptr %arg0, [%c0_i64, %c0_i64], [%c1_i64, %pitch], [%c0_i32, %c0_i32] {order = array<i32: 0, 1>} : <tensor<32x64xf16, #dot_b>>
    %9 = tt.load %7 {boundaryCheck = array<i32: 1>, cache = 1 : i32, evict = 1 : i32, isVolatile = false, padding = 1 : i32} : !tt.ptr<tensor<64x32xf16, #dot_a>>
    %10 = tt.load %8 {boundaryCheck = array<i32: 0>, cache = 1 : i32, evict = 1 : i32, isVolatile = false, padding = 1 : i32} : !tt.ptr<tensor<32x64xf16, #dot_b>>

But, I added one that covers this scenario + rewrite tensor pointer here: #2347

alexbaden requested a review from chengjunlu September 25, 2024 20:39

chengjunlu reviewed Sep 26, 2024

View reviewed changes

third_party/intel/lib/TritonIntelGPUTransforms/MaterializeBlockPointer.cpp Outdated Show resolved Hide resolved

chengjunlu approved these changes Sep 26, 2024

View reviewed changes

alexbaden added 2 commits September 26, 2024 00:51

Use stride instead of order to determine block attr

8a4cab2

fix pitch restriction check + remove commented code

c09e10e

alexbaden force-pushed the alex/materialize_block_pointer_stride branch from db2a0e9 to c09e10e Compare September 26, 2024 00:52

whitneywhtsang reviewed Sep 26, 2024

View reviewed changes

third_party/intel/lib/TritonIntelGPUTransforms/MaterializeBlockPointer.cpp Outdated Show resolved Hide resolved

third_party/intel/lib/TritonIntelGPUTransforms/MaterializeBlockPointer.cpp Outdated Show resolved Hide resolved

chengjunlu approved these changes Sep 26, 2024

View reviewed changes

alexbaden and others added 2 commits September 25, 2024 20:56

++i

9f7bfdc

Co-authored-by: Whitney Tsang <whitney.tsang@intel.com>

remove unused variable

de99c71

format

5a23de4

whitneywhtsang approved these changes Sep 26, 2024

View reviewed changes

alexbaden merged commit 979301f into main Sep 26, 2024
4 checks passed

alexbaden deleted the alex/materialize_block_pointer_stride branch September 26, 2024 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use stride instead of order to determine block attr #2349

Use stride instead of order to determine block attr #2349

alexbaden commented Sep 25, 2024

whitneywhtsang left a comment

alexbaden commented Sep 26, 2024

Use stride instead of order to determine block attr #2349

Use stride instead of order to determine block attr #2349

Conversation

alexbaden commented Sep 25, 2024

whitneywhtsang left a comment

Choose a reason for hiding this comment

alexbaden commented Sep 26, 2024