Skip to content

v1.2.7

Compare
Choose a tag to compare
@kernhanda kernhanda released this 13 Jul 18:06
· 34 commits to main since this release

v1.2.7

  • Merged PR 2744: [doc] Fixes link in reference/functions/cast.md, revs
    version on all docs. [Kern Handa]

    [doc] Fixes link in reference/functions/cast.md

  • Merged PR 2743: [DSL] Document implicit casting rules and the explicit
    cast function. [Lisa Ong]

    • Document implicit casting rules implemented by !2693
    • Promote acc.cast to a documented function to give the user control to override implicit casting behavior
  • Merged PR 2739: Updates ROCM tensorization pattern to handle casting.
    [Kern Handa]

    Updates ROCM tensorization pattern to handle casting

  • Merged PR 2643: Some fixes for last major array caching in
    tensorization. [Mason Remy]

    Some fixes for last major array caching in tensorization

  • Merged PR 2693: Updates DSL codegen to implicitly cast if possible.
    [Kern Handa]

    Updates DSL codegen to implicitly cast if possible

  • Merged PR 2735: Pass multiple input files as comma-separated list to
    benchmark tool. [Ritwik Das]

    https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41588&view=logs&j=d78921a4-2f18-50b0-77ad-4c6803f3371b&t=f97c60f6-ada7-5ec9-5ea1-510216c408e9

    Above pipeline did not run the 2nd set of input sizes since the 1st process did not exit until pipeline timeout was hit. After the fix, we will always have a single job.

  • Merged PR 2721: Remove unnecessary logging in benchmarks. [Ritwik Das]

    Remove unnecessary logging in benchmarks

  • Merged PR 2674: Support emitting runtime array sizes in the Value DSL.
    [Lisa Ong]

    • Minimum set of changes to support runtime sizes in the Value DSL without transformations
    • Add a ScalarDimension type (name TBC) which is aliased to Scalar
    • Support variable ends in MemoryLayout, ScheduledLoopOp, RangeValueAnalysis
    • Use mlir::ShapedType::kDynamicSize and mlir::ShapedType::kDynamicStrideOrOffset as sentinel values, following the pattern in MemRefOps, TensorOps, etc.
    • TODO: E2E verification in the next PR
    • TODO: Python DSL changes in the next PR

    Output of mlir-translate for the runtime_sizes_all case, where %21, %22 and %23 are the runtime sizes for M, N, and K:

    define void @NestMatMul(float* %0, float* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, float* %7, float* %8, i64 %9, i64 %10, i64 %11, i64 %12, i64 %13, float* %14, float* %15, i64 %16, i64 %17, i64 %18, i64 %19, i64 %20, i64 %21, i64 %22, i64 %23) !dbg !3 {
      br label %25, !dbg !7
    
    25:                                               ; preds = %57, %24
      %26 = phi i64 [ %58, %57 ], [ 0, %24 ]
      %27 = icmp slt i64 %26, %21, !dbg !9
      br i1 %27, label %28, label %59, !dbg !10
    
    28:                                               ; preds = %25
      br label %29, !dbg !11
    
    29:                                               ; preds = %55, %28
      %30 = phi i64 [ %56, %55 ], [ 0, %28 ]
      %31 = icmp slt i64 %30, %22, !dbg !12
      br i1 %31, label %32, label %57, !dbg !13
    
    32:                                               ; preds = %29
      br label %33, !dbg !14
    
    33:                                               ; preds = %36, %32
      %34 = phi i64 [ %54, %36 ], [ 0, %32 ]
      %35 = icmp slt i64 %34, %23, !dbg !15
      br i1 %35, label %36, label %55, !dbg !16
    
    36:                                               ; preds = %33
      %37 = mul i64 %26, %5, !dbg !17
      %38 = add i64 %37, %34, !dbg !18
      %39 = getelementptr float, float* %1, i64 %38, !dbg !19
      %40 = load float, float* %39, align 4, !dbg !20
      %41 = mul i64 %34, %12, !dbg !21
      %42 = add i64 %41, %30, !dbg !22
      %43 = getelementptr float, float* %8, i64 %42, !dbg !23
      %44 = load float, float* %43, align 4, !dbg !24
      %45 = fmul float %40, %44, !dbg !25
      %46 = mul i64 %26, %19, !dbg !26
      %47 = add i64 %46, %30, !dbg !27
      %48 = getelementptr float, float* %15, i64 %47, !dbg !28
      %49 = load float, float* %48, align 4, !dbg !29
      %50 = fadd float %49, %45, !dbg !30
      %51 = mul i64 %26, %19, !dbg !31
      %52 = add i64 %51, %30, !dbg !32
      %53 = getelementptr float, float* %15, i64 %52, !dbg !33
      store float %50, float* %53, align 4, !dbg !34
      %54 = add i64 %34, 1, !dbg !35
      br label %33, !dbg !36
    
    55:                                               ; preds = %33
      %56 = add i64 %30, 1, !dbg !37
      br label %29, !dbg !38
    
    57:                                               ; preds = %29
      %58 = add i64 %26, 1, !dbg !39
      br label %25, !dbg !40
    
    59:                                               ; preds = %25
      ret void, !dbg !41
    }
    

    Related work items: #3716, #3717

  • Merged PR 2682: Add nvidia device optimized sizes and some benchmark
    fixes. [Ritwik Das]

    Add nvidia dev opt sizes and some bench fixes

  • Merged PR 2676: Add automated weekly rocm baseline benchmark. [Ritwik
    Das]

    https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41316&view=logs&j=4f7f213a-5f0f-58b0-1189-99ef12faf0d8&t=687344d2-d6b6-5d8c-dd9d-6aab558fd96c

    https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41314&view=logs&j=4f7f213a-5f0f-58b0-1189-99ef12faf0d8

  • Merged PR 2673: Add automated weekly baseline benchmarks on Nvidia
    GPU. [Ritwik Das]