Release v1.2.7 · microsoft/Accera

v1.2.7

Merged PR 2744: [doc] Fixes link in reference/functions/cast.md, revs
version on all docs. [Kern Handa]

[doc] Fixes link in reference/functions/cast.md
Merged PR 2743: [DSL] Document implicit casting rules and the explicit
cast function. [Lisa Ong]
- Document implicit casting rules implemented by !2693
- Promote acc.cast to a documented function to give the user control to override implicit casting behavior
Merged PR 2739: Updates ROCM tensorization pattern to handle casting.
[Kern Handa]

Updates ROCM tensorization pattern to handle casting
Merged PR 2643: Some fixes for last major array caching in
tensorization. [Mason Remy]

Some fixes for last major array caching in tensorization
Merged PR 2693: Updates DSL codegen to implicitly cast if possible.
[Kern Handa]

Updates DSL codegen to implicitly cast if possible
Merged PR 2735: Pass multiple input files as comma-separated list to
benchmark tool. [Ritwik Das]

https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41588&view=logs&j=d78921a4-2f18-50b0-77ad-4c6803f3371b&t=f97c60f6-ada7-5ec9-5ea1-510216c408e9

Above pipeline did not run the 2nd set of input sizes since the 1st process did not exit until pipeline timeout was hit. After the fix, we will always have a single job.
Merged PR 2721: Remove unnecessary logging in benchmarks. [Ritwik Das]

Remove unnecessary logging in benchmarks

Merged PR 2674: Support emitting runtime array sizes in the Value DSL.
[Lisa Ong]

Minimum set of changes to support runtime sizes in the Value DSL without transformations
Add a ScalarDimension type (name TBC) which is aliased to Scalar
Support variable ends in MemoryLayout, ScheduledLoopOp, RangeValueAnalysis
Use mlir::ShapedType::kDynamicSize and mlir::ShapedType::kDynamicStrideOrOffset as sentinel values, following the pattern in MemRefOps, TensorOps, etc.
TODO: E2E verification in the next PR
TODO: Python DSL changes in the next PR

Output of mlir-translate for the runtime_sizes_all case, where %21, %22 and %23 are the runtime sizes for M, N, and K:

define void @NestMatMul(float* %0, float* %1, i64 %2, i64 %3, i64 %4, i64 %5, i64 %6, float* %7, float* %8, i64 %9, i64 %10, i64 %11, i64 %12, i64 %13, float* %14, float* %15, i64 %16, i64 %17, i64 %18, i64 %19, i64 %20, i64 %21, i64 %22, i64 %23) !dbg !3 {
  br label %25, !dbg !7

25:                                               ; preds = %57, %24
  %26 = phi i64 [ %58, %57 ], [ 0, %24 ]
  %27 = icmp slt i64 %26, %21, !dbg !9
  br i1 %27, label %28, label %59, !dbg !10

28:                                               ; preds = %25
  br label %29, !dbg !11

29:                                               ; preds = %55, %28
  %30 = phi i64 [ %56, %55 ], [ 0, %28 ]
  %31 = icmp slt i64 %30, %22, !dbg !12
  br i1 %31, label %32, label %57, !dbg !13

32:                                               ; preds = %29
  br label %33, !dbg !14

33:                                               ; preds = %36, %32
  %34 = phi i64 [ %54, %36 ], [ 0, %32 ]
  %35 = icmp slt i64 %34, %23, !dbg !15
  br i1 %35, label %36, label %55, !dbg !16

36:                                               ; preds = %33
  %37 = mul i64 %26, %5, !dbg !17
  %38 = add i64 %37, %34, !dbg !18
  %39 = getelementptr float, float* %1, i64 %38, !dbg !19
  %40 = load float, float* %39, align 4, !dbg !20
  %41 = mul i64 %34, %12, !dbg !21
  %42 = add i64 %41, %30, !dbg !22
  %43 = getelementptr float, float* %8, i64 %42, !dbg !23
  %44 = load float, float* %43, align 4, !dbg !24
  %45 = fmul float %40, %44, !dbg !25
  %46 = mul i64 %26, %19, !dbg !26
  %47 = add i64 %46, %30, !dbg !27
  %48 = getelementptr float, float* %15, i64 %47, !dbg !28
  %49 = load float, float* %48, align 4, !dbg !29
  %50 = fadd float %49, %45, !dbg !30
  %51 = mul i64 %26, %19, !dbg !31
  %52 = add i64 %51, %30, !dbg !32
  %53 = getelementptr float, float* %15, i64 %52, !dbg !33
  store float %50, float* %53, align 4, !dbg !34
  %54 = add i64 %34, 1, !dbg !35
  br label %33, !dbg !36

55:                                               ; preds = %33
  %56 = add i64 %30, 1, !dbg !37
  br label %29, !dbg !38

57:                                               ; preds = %29
  %58 = add i64 %26, 1, !dbg !39
  br label %25, !dbg !40

59:                                               ; preds = %25
  ret void, !dbg !41
}

Related work items: #3716, #3717

Merged PR 2682: Add nvidia device optimized sizes and some benchmark
fixes. [Ritwik Das]

Add nvidia dev opt sizes and some bench fixes
Merged PR 2676: Add automated weekly rocm baseline benchmark. [Ritwik
Das]

https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41316&view=logs&j=4f7f213a-5f0f-58b0-1189-99ef12faf0d8&t=687344d2-d6b6-5d8c-dd9d-6aab558fd96c

https://intelligentdevices.visualstudio.com/ELL/_build/results?buildId=41314&view=logs&j=4f7f213a-5f0f-58b0-1189-99ef12faf0d8
Merged PR 2673: Add automated weekly baseline benchmarks on Nvidia
GPU. [Ritwik Das]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.7

v1.2.7