v1.2.7
v1.2.7
-
Merged PR 2744: [doc] Fixes link in reference/functions/cast.md, revs
version on all docs. [Kern Handa][doc] Fixes link in reference/functions/cast.md
-
Merged PR 2743: [DSL] Document implicit casting rules and the explicit
cast
function. [Lisa Ong]- Document implicit casting rules implemented by !2693
- Promote
acc.cast
to a documented function to give the user control to override implicit casting behavior
-
Merged PR 2739: Updates ROCM tensorization pattern to handle casting.
[Kern Handa]Updates ROCM tensorization pattern to handle casting
-
Merged PR 2643: Some fixes for last major array caching in
tensorization. [Mason Remy]Some fixes for last major array caching in tensorization
-
Merged PR 2693: Updates DSL codegen to implicitly cast if possible.
[Kern Handa]Updates DSL codegen to implicitly cast if possible
-
Merged PR 2735: Pass multiple input files as comma-separated list to
benchmark tool. [Ritwik Das]Above pipeline did not run the 2nd set of input sizes since the 1st process did not exit until pipeline timeout was hit. After the fix, we will always have a single job.
-
Merged PR 2721: Remove unnecessary logging in benchmarks. [Ritwik Das]
Remove unnecessary logging in benchmarks
-
Merged PR 2674: Support emitting runtime array sizes in the Value DSL.
[Lisa Ong]- Minimum set of changes to support runtime sizes in the Value DSL without transformations
- Add a ScalarDimension type (name TBC) which is aliased to Scalar
- Support variable ends in MemoryLayout, ScheduledLoopOp, RangeValueAnalysis
- Use mlir::ShapedType::kDynamicSize and mlir::ShapedType::kDynamicStrideOrOffset as sentinel values, following the pattern in MemRefOps, TensorOps, etc.
- TODO: E2E verification in the next PR
- TODO: Python DSL changes in the next PR
Output of mlir-translate for the runtime_sizes_all case, where %21, %22 and %23 are the runtime sizes for M, N, and K:
define void @NestMatMul(float* %0, float* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, float* %7, float* %8, i64 %9, i64 %10, i64 %11, i64 %12, i64 %13, float* %14, float* %15, i64 %16, i64 %17, i64 %18, i64 %19, i64 %20, i64 %21, i64 %22, i64 %23) !dbg !3 { br label %25, !dbg !7 25: ; preds = %57, %24 %26 = phi i64 [ %58, %57 ], [ 0, %24 ] %27 = icmp slt i64 %26, %21, !dbg !9 br i1 %27, label %28, label %59, !dbg !10 28: ; preds = %25 br label %29, !dbg !11 29: ; preds = %55, %28 %30 = phi i64 [ %56, %55 ], [ 0, %28 ] %31 = icmp slt i64 %30, %22, !dbg !12 br i1 %31, label %32, label %57, !dbg !13 32: ; preds = %29 br label %33, !dbg !14 33: ; preds = %36, %32 %34 = phi i64 [ %54, %36 ], [ 0, %32 ] %35 = icmp slt i64 %34, %23, !dbg !15 br i1 %35, label %36, label %55, !dbg !16 36: ; preds = %33 %37 = mul i64 %26, %5, !dbg !17 %38 = add i64 %37, %34, !dbg !18 %39 = getelementptr float, float* %1, i64 %38, !dbg !19 %40 = load float, float* %39, align 4, !dbg !20 %41 = mul i64 %34, %12, !dbg !21 %42 = add i64 %41, %30, !dbg !22 %43 = getelementptr float, float* %8, i64 %42, !dbg !23 %44 = load float, float* %43, align 4, !dbg !24 %45 = fmul float %40, %44, !dbg !25 %46 = mul i64 %26, %19, !dbg !26 %47 = add i64 %46, %30, !dbg !27 %48 = getelementptr float, float* %15, i64 %47, !dbg !28 %49 = load float, float* %48, align 4, !dbg !29 %50 = fadd float %49, %45, !dbg !30 %51 = mul i64 %26, %19, !dbg !31 %52 = add i64 %51, %30, !dbg !32 %53 = getelementptr float, float* %15, i64 %52, !dbg !33 store float %50, float* %53, align 4, !dbg !34 %54 = add i64 %34, 1, !dbg !35 br label %33, !dbg !36 55: ; preds = %33 %56 = add i64 %30, 1, !dbg !37 br label %29, !dbg !38 57: ; preds = %29 %58 = add i64 %26, 1, !dbg !39 br label %25, !dbg !40 59: ; preds = %25 ret void, !dbg !41 }
Related work items: #3716, #3717
-
Merged PR 2682: Add nvidia device optimized sizes and some benchmark
fixes. [Ritwik Das]Add nvidia dev opt sizes and some bench fixes
-
Merged PR 2676: Add automated weekly rocm baseline benchmark. [Ritwik
Das] -
Merged PR 2673: Add automated weekly baseline benchmarks on Nvidia
GPU. [Ritwik Das]