Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 8e4041a19d8577a4c14741b45b6ab733e5d53d74
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Wed Aug 10 00:22:43 2022 +0000

    Merged PR 2814: Parameterize batch_size in GPU benchmarks

    Parameterize batch_size in GPU benchmarks

commit eb6197a8f54555ab47be383f7f48b7efc9442041
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Mon Aug 8 05:51:35 2022 +0000

    Merged PR 2810: [release] [nfc] Bump docs version to 1.2.8, bump github actions to llvm 14.0.6

    Preparation for 1.2.8 release

commit 63c7a397210a753836a647f399e2798bba521939
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Mon Aug 8 05:03:36 2022 +0000

    Merged PR 2808: [ci] Add vcpkg caching for buddy builds, disable flaky parallelized tests

    * Enable vcpkg binary caching for CI pipelines that are using non custom agents. This reduces vcpkg install time from 2-3 minutes to ~30 seconds
    * ctest --parallel on macos can sometimes fail randomly. The tests will need to be updated to support running in parallel

    References: https://vcpkg.io/en/docs/users/binarycaching.html
    Note: an organization-wide Nuget feed must be created. Project-wide Nuget feeds will fail with access denied.

commit 37e207a0deb2c8c431a6c0e73787c726221a4f37
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Mon Aug 8 04:13:14 2022 +0000

    Merged PR 2804: [ci] Reduce runtimes of PR Buddy Builds

    * Remove redundant setup.py builds in pipelines with cmake builds
    * Build debug for Linux only (the fastest config)
    * Add pipeline caching for ccache, conan, and pip where applicable
    * Add parallel configs where applicable
    * Filter out some tests on windows due to slow runtimes. These should have coverage on Linux  and macOS.

commit c8940050d9064e5c326644761ef2821b59e8e431
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Fri Aug 5 23:31:24 2022 +0000

    Merged PR 2807: Enable verification for CK baselines

    - Enable verification for CK baselines
    - increase timeout for cuda resnet
    - add functionality for extracting kernel code from cosmosdb

commit da114623db8518c9b47eae83eee1357f0bcd3565
Author: Chuck Jacobs <cjacobs@microsoft.com>
Date:   Fri Aug 5 22:35:43 2022 +0000

    Merged PR 2802: Fix barrier optimization pass

    This PR fixes a couple of barrier-related issues:
    - The barrier optimization pass wasn't keeping barriers that protected vector load/store ops
    - Multiple barriers were getting generated when hoisting barriers out of conditionals

    Related work items: #3732

commit 03171fec09146bdeacfdc2da68ff73202e30d534
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Thu Aug 4 19:08:27 2022 +0000

    Merged PR 2800: Add max_threads to parallelize and change default behavior

    - Add num_threads to parallelize
    - change default behavior to count the number of iterations of the given indices
    - Update documentation

commit 7ff3a90dd09f74c8699657b17909a575c59267fa
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Thu Aug 4 16:42:29 2022 +0000

    Merged PR 2801: Remove verification on cuda-fp32-big benchmark

    Remove verification on cuda-fp32-big benchmark

commit 5e6f6d93f7c62b2f965e12cb69d00b77a6c65a89
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Mon Aug 1 22:36:03 2022 +0000

    Merged PR 2798: LLVM 14.0.6 upgrade

    An incremental upgrade with minimal or no changes to MLIR

commit bf8faeaee154befc7c34e221fe54f9a1fd2799f3
Author: Kern Handa <kerha@microsoft.com>
Date:   Sat Jul 30 04:55:03 2022 +0000

    Merged PR 2796: Makes NestedPassAdaptor's pipeline consistent

    Makes NestedPassAdaptor's pipeline consistent

    This change makes it so NestedPassAdaptor creates a new pass manager
    every time a new pass is added. Prior to this change, if dumpPasses was
    false, the same nested pass manager would be used. If dumpPasses was
    true, a new nested pass manager would be created per call to addPass.
    This difference in behavior was also resulting in the lowering pipeline
    to be different, depending on the value of dumpPasses.

    For example, in the following code in AcceraPasses.cpp, all the passes
    that are added to `funcOpPM` run BEFORE `createConvertSCFToOpenMPPass`
    if `dumpPasses` was false.

    ```cpp
        auto funcOpPM = pmAdaptor.nestPassManager([&]() -> OpPassManager& { return pm.nest<v::ValueModuleOp>().nest<FuncOp>(); });
        funcOpPM.addPass(createConvertLinalgToAffineLoopsPass());
        funcOpPM.addPass(createSimplifyAffineStructuresPass());
        funcOpPM.addPass(createCanonicalizerPass());
        funcOpPM.addPass(createLoopInvariantCodeMotionPass());
        funcOpPM.addPass(createCSEPass());

        pmAdaptor.addPass(createConvertSCFToOpenMPPass());
        pmAdaptor.addPass(value::createValueToStdPass(options.enableProfile));
        funcOpPM.addPass(value::createBarrierOptPass(options.writeBarrierGraph.getValue(), options.barrierGraphFilename.getValue()));
        pmAdaptor.addPass(value::createRangeValueOptimizePass());
        pmAdaptor.addPass(createCanonicalizerPass());
        pmAdaptor.addPass(createCSEPass());
    ```

    Additionally, this change exposed the fact that the BarrierOpt pass is
    incorrectly erasing barriers, and so has been made into a no-op until
    this correctness issue has been fixed.

commit d97a5fd55712ce783f65dd948a9e9c152b1ff2d1
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Thu Jul 28 18:33:54 2022 +0000

    Merged PR 2795: [docs] Cleanup viz scripts, clarify reorder illustrations

    * Clarify in the labels while working on the animated version

    * Cleanup and rename .js files for (slightly) easier lookup

commit 4afe6b763c2097a9a09eebac4ed5f0f5f59f587d
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Thu Jul 28 08:06:22 2022 +0000

    Merged PR 2475: LLVM 14.0.0 upgrade

    Tag: llvmorg-14.0.0

    Notable changes:
    * std dialect ops are now moved to arith, math dialects
    * StrEnumAttribute is now replaced by simple enums. This affects things like gpu.dimension.x
    * [Issue] linalg.copy is removed, replaced by memref.copy, which introduces a runtime dependency on a `memrefCopy` C function for non-identity layout copies. This affects Array.sub_array in debug mode.
    * [Regression] OMP to LLVM lowering will crash in mlir-translate findAlloc due to a empty set of blocks being emitted. This only affects dynamic scheduling with collapsed loops.
    * Lots of renames
    * Upgraded macOS to macOS-12

    Related work items: #3646

commit de3bd0ffde5ebb4bf69cb0db0c46bc76fef37c4b
Author: Denny Sun <dennys@microsoft.com>
Date:   Thu Jul 28 01:02:23 2022 +0000

    Merged PR 2753: accera.Dimension and runtime-sized Arrays in the Python DSL

    With this change, Accera is able to generate the initial mlir for runtime sized Arrays. The ir lowering is not fully working due to some bug, which can be fixed in the later changes.

    ```
            M = Dim()
            N = Dim()
            K = Dim()

            A = Array(shape=(M, K), element_type=ScalarType.float32, role=Array.Role.INPUT)
            B = Array(shape=(K, N), element_type=ScalarType.float32, role=Array.Role.INPUT)
            C = Array(shape=(M, N), element_type=ScalarType.float32, role=Array.Role.INPUT_OUTPUT)

            nest = Nest((M, N, K))
            i, j, k = nest.get_indices()

            @nest.iteration_logic
            def _():
                C[i, j] += A[i, k] * B[k, j]

            package.add()
            package.build()
    ```

    ```
    module @test_runtimesizes attributes {llvm.data_layout = "... ..."}  {
      accv.module "test_runtimesizes"  {
        accv.func nested @runtimesizes_..._impl_...(%arg0: index loc(unknown), %arg1: index loc(unknown), %arg2: index loc(unknown), %arg3: memref<?x?xf32, #map> loc(unknown), %arg4: memref<?x?xf32, #map> loc(unknown), %arg5: memref<?x?xf32, #map> loc(unknown)) attributes {accv.output_verifiers = ["", "", "", "", "", "_debug_check_allclose_<accera.lang.Dim.Dim object at ...>_<accera.lang.Dim.Dim object at ...>_..."], exec_target = 0 : i64} {
          %0 = "accv.get_element"(<<UNKNOWN SSA VALUE>>) : (memref<index>) -> index loc(#loc)
          %1 = "accv.get_element"(<<UNKNOWN SSA VALUE>>) : (memref<index>) -> index loc(#loc)
          %2 = "accv.get_element"(<<UNKNOWN SSA VALUE>>) : (memref<index>) -> index loc(#loc)
          "accln.nest"(%0, %1, %2) ( {
            %3 = accln.sym_index {name = "i"} #accln<"index{i,3}"> loc(#loc)
            %4 = accln.sym_index {name = "j"} #accln<"index{j,4}"> loc(#loc)
            %5 = accln.sym_index {name = "k"} #accln<"index{k,5}"> loc(#loc)
            "accln.kernel"() ( {
              %7 = "accv.slice"(%arg5, %3, %4) {sliceDimensions = [0, 1]} : (memref<?x?xf32, #map>, index, index) -> memref<f32> loc(#loc)
              ... ...
              accln.terminator loc(#loc)
            }) {sym_name = "_"} : () -> () loc(#loc)
            ... ...
            accln.terminator loc(#loc)
          }) {domain = #domain0, exec_target = 0 : i64, kernels = []} : (index, index, index) -> () loc(#loc)
          accv.return loc(#loc)
        } loc(#loc)
        accv.func @runtimesizes_...(%arg0: index loc(unknown), %arg1: index loc(unknown), %arg2: index lo...

commit 75553672d92f6e60638b4cb6169bda9712401eef
Author: JUBI TANEJA <jubitaneja@microsoft.com>
Date:   Wed Jul 27 23:34:03 2022 +0000

    Merged PR 2793: support sign extend op in canVectorize() function to improve generated MLIR

    While trying to optimize `int16` `MatMul` with vectorize transformation in DSL, we noticed an unrolled loop with load, binop, sexti, store instructions. There was no vector instruction emitted and it hinted us that sign extend instruction is not supported in `canVectorize` function and now with this op supported, we can emit some vector instructions in the MLIR.

commit 4fa740166b1c17359d40b540b7b7eb623caa167a
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Wed Jul 27 07:02:01 2022 +0000

    Merged PR 2790: Filter invalid kernels from GPU benchmarks

    - Filter invalid kernels from GPU benchmarks
    - Disable verification on cuda f16 benchmarks
    - Remove frequent cleanups

commit 6cff78412b04f15c3257bc2286a3805974a56012
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Tue Jul 26 03:27:14 2022 +0000

    Merged PR 2787: Remove MLIR flag from package format in benchmarks

    Remove MLIR flag from package format in benchmarks

commit 0e7b7ef930f177e6a70b88aad7982e9a36dd116f
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Mon Jul 25 23:10:43 2022 +0000

    Merged PR 2784: Merge Github changes to ADO

    Author: Lisa Ong <11318241+lisaong@users.noreply.github.com>
    Date:   Mon Jul 25 19:13:00 2022 +0800

        Update Building_on_Ubuntu.md

    commit 474d7e4c6fd7dcd2f723193e69446da4c63f97ee
    Author: Lisa Ong <11318241+lisaong@users.noreply.github.com>
    Date:   Mon Jul 25 19:03:30 2022 +0800

        Github codespaces configuration (#48)

    commit 0e8ffcd806bfc1671c89e599c2562592c4d06f21
    Author: Anthony Shaw <anthony.p.shaw@gmail.com>
    Date:   Mon Jul 25 15:34:18 2022 +1000

        Set license field in metadata of package (#46)

        * Set license field in meta

        * Update all setup.cfg files

    commit 9a8ea90b22b02379072a98eacc8d5f49c1a28e69
    Author: Lisa Ong <11318241+lisaong@users.noreply.github.com>
    Date:   Mon Jul 25 10:24:26 2022 +0800

        Enable CIs from pull requests from forks

commit 8275363815c0e128ff477a8c7692ad44353db5aa
Author: Chuck Jacobs <cjacobs@microsoft.com>
Date:   Mon Jul 25 20:41:39 2022 +0000

    Merged PR 2776: Make fusing more efficient

    This PR refactors the code generation for schedules and makes it more efficient. This makes a big difference for complex schedules with constraints on the kernels (like the ones generated when fusing schedules).

    Here are some timings on a few tests (modified versions of Mason's example script) I ran:

    | test | main branch | PR branch |
    |----|----|----|
    | 3 fused schedules, tile first only | 18.8s | 5.8s |
    | 3 fused schedules, tile 1 & 2 | 190s | 6.2s |
    | 3 fused schedules, tile all 3 | ???? | 7.2s |

    Related work items: #3731

commit 2306afbb9425dcecced6a38590da2ae31937d23e
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Mon Jul 25 06:51:24 2022 +0000

    Merged PR 2781: Fix benchmark with MLIR format and add repro test

commit 6e72de99c9f1952111081dc320cc055eb09aabf6
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Sat Jul 23 04:26:14 2022 +0000

    Merged PR 2780: Type support for tensor ops in CUDA

    - Add support for FP32 input (TF32 compute)
    - Add support for bfloat16 input/FP32 output
    - Add support for integer types

    Related work items: #3709, #3710

commit 3bedc2c51c7b93865462df68b154db9c4bdda5ec
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Fri Jul 22 04:11:41 2022 +0000

    Merged PR 2779: Some assorted benchmark fixes

    - Build Accera in release mode
    - Shuffle gemm sizes to run small sizes first
    - Increase tolerance to account for floating point drift for large k-split

commit cb010de71df47c75c48ae3a8a749e03fc606e24f
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Thu Jul 21 19:43:11 2022 +0000

    Merged PR 2774: Add input caching tests for CUDA, enable tests in PR pipelines

    Add input caching tests in CUDA

    Related work items: #3725

commit 85c09542106b4e685ba4141d9cb34b823d0d02b7
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Wed Jul 20 23:55:04 2022 +0000

    Merged PR 2677: Unify rocm/cuda tensor ops lowering under accv dialect

    - remove gpu dialect lowering (CUDA)
    - add accv dialect lowering for CUDA
    - rocm and cuda lowering use the same semantics

    Related work items: #3728

commit 282d66743b4d5d828998198d94003ea93228165c
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Jul 19 02:14:58 2022 +0000

    Merged PR 2764: [doc] Rename acc.Dim to acc.Dimension and add create_dimensions()

    * Rename `acc.Dim` to `acc.Dimension`, `acc.Dim.Role` to `acc.Dimension.Role`
    * Add the simplified `acc.create_dimensions()` construction pattern
    * Kept the `acc.Dimension` constructor for advanced use cases involving generator patterns

    Related work items: #3720

commit e8a0a7475acc2b117b41b97238b2ecb11060cbd6
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Thu Jul 14 03:27:17 2022 +0000

    Merged PR 2752: Add nargs to input args in benchmark tool

    add nargs to input args in benchmark tool

commit 2c5083a721b4cbef11d1782b43d34d67b76caa0e
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Wed Jul 13 05:07:40 2022 +0000

    Merged PR 2680: [doc] Manual and Reference doc updates for Runtime Array DSL

    Proposed DSL changes for supporting runtime array sizes:
    * Adds a new dimension type that serves as a placeholder for runtime dimension sizes for `Array` and `Nest`. Supports both input and output dimensions
    * Adds output-only Arrays
    * Add the Scalar type
    * Example kernels demonstrating different aspects:
      * Gather: basic features
      * Range: scalar function arguments
      * ReduceMean: fusion

    Related work items: #3720

commit dbdbbb94c98f787782e8e4a6171a3af067f91e58
Author: Denny Sun <dennys@microsoft.com>
Date:   Wed Jul 13 01:45:07 2022 +0000

    Merged PR 2683: Support conditionals in Logic Function

    Before this change, there is no way to emit conditionals in logic function.

    With this change, the user is able to write the following logic function:

    ```
                def if_func():
                    T[i, j] = A[i, j] + B[i, j]
                    C[i, j] += T[i, j]**2.

                def elseif_func():
                    T[i, j] = A[i, j] - B[i, j]
                    C[i, j] += T[i, j]**2.

                def else_func():
                    C[i, j] = A[i, j] + B[i, j]

                @nest.iteration_logic
                def _():
                    _If(j<100, if_func).ElseIf(i>100, elseif_func).Else(else_func)
    ```

    Related work items: #3706
  • Loading branch information
Ritwik Das committed Aug 10, 2022
1 parent 8be492c commit a9ab6bd
Show file tree
Hide file tree
Showing 282 changed files with 6,926 additions and 5,446 deletions.
8 changes: 4 additions & 4 deletions .azure/cuda/cuda-benchmark-baseline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ jobs:
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --cublas $(Build.SourcesDirectory)/build/temp.linux-x86_64-3.8/tools/benchmarkers/cublas/cublas_gemm --input gemm_rectangle_A6000.csv,gemm_square.csv,gemm_bert_assorted.csv
python gpu_benchmark_tool.py --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --cublas $(Build.SourcesDirectory)/build/temp.linux-x86_64-3.8/tools/benchmarkers/cublas/cublas_gemm --input gemm_rectangle_A6000.csv,gemm_square.csv,gemm_bert_assorted.csv,gemm_resnet_inception.csv
python gpu_benchmark_tool.py --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --verbose --cublas $(Build.SourcesDirectory)/build/temp.linux-x86_64-3.8/tools/benchmarkers/cublas/cublas_gemm --input gemm_rectangle_A6000.csv gemm_square.csv gemm_bert_assorted.csv
python gpu_benchmark_tool.py --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --verbose --cublas $(Build.SourcesDirectory)/build/temp.linux-x86_64-3.8/tools/benchmarkers/cublas/cublas_gemm --input gemm_rectangle_A6000.csv gemm_square.csv gemm_bert_assorted.csv gemm_resnet_inception.csv
displayName: Run CUBLAS benchmarks
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand All @@ -71,8 +71,8 @@ jobs:
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --cutlass $(System.DefaultWorkingDirectory)/cutlass/build/tools/profiler/cutlass_profiler --input gemm_rectangle_A6000.csv,gemm_square.csv,gemm_bert_assorted.csv
python gpu_benchmark_tool.py --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --cutlass $(System.DefaultWorkingDirectory)/cutlass/build/tools/profiler/cutlass_profiler --input gemm_rectangle_A6000.csv,gemm_square.csv,gemm_bert_assorted.csv,gemm_resnet_inception.csv
python gpu_benchmark_tool.py --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --verbose --cutlass $(System.DefaultWorkingDirectory)/cutlass/build/tools/profiler/cutlass_profiler --input gemm_rectangle_A6000.csv gemm_square.csv gemm_bert_assorted.csv
python gpu_benchmark_tool.py --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --verbose --cutlass $(System.DefaultWorkingDirectory)/cutlass/build/tools/profiler/cutlass_profiler --input gemm_rectangle_A6000.csv gemm_square.csv gemm_bert_assorted.csv gemm_resnet_inception.csv
displayName: Run CUTLASS benchmarks
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand Down
4 changes: 2 additions & 2 deletions .azure/cuda/cuda-benchmark-fp16-bert.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ jobs:
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
python ./setup.py build -g -b build -t build bdist_wheel -d build/dist
python ./setup.py build -b build -t build bdist_wheel -d build/dist
displayName: Python build
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --input gemm_bert_assorted.csv --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --check True
python gpu_benchmark_tool.py --input gemm_bert_assorted.csv --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE
displayName: Run fp16 benchmarks BERT
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand Down
4 changes: 2 additions & 2 deletions .azure/cuda/cuda-benchmark-fp16-big.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ jobs:
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
python ./setup.py build -g -b build -t build bdist_wheel -d build/dist
python ./setup.py build -b build -t build bdist_wheel -d build/dist
displayName: Python build
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --input gemm_big_A6000.csv,gemm_big.csv --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --check True
python gpu_benchmark_tool.py --input gemm_big_A6000.csv gemm_big.csv --type h --batch_size 1 --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE
displayName: Run fp16 benchmarks BIG A6000
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand Down
4 changes: 2 additions & 2 deletions .azure/cuda/cuda-benchmark-fp16.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ jobs:
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
python ./setup.py build -g -b build -t build bdist_wheel -d build/dist
python ./setup.py build -b build -t build bdist_wheel -d build/dist
displayName: Python build
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --input gemm_small_A6000.csv,gemm_small.csv --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --check True
python gpu_benchmark_tool.py --input gemm_small_A6000.csv gemm_small.csv --type h --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE
displayName: Run fp16 benchmarks A6000
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand Down
4 changes: 2 additions & 2 deletions .azure/cuda/cuda-benchmark-fp32-bert.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ jobs:
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
python ./setup.py build -g -b build -t build bdist_wheel -d build/dist
python ./setup.py build -b build -t build bdist_wheel -d build/dist
displayName: Python build
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --input gemm_bert_assorted.csv --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --check True
python gpu_benchmark_tool.py --input gemm_bert_assorted.csv --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --verbose --check
displayName: Run fp32 benchmarks BERT
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand Down
4 changes: 2 additions & 2 deletions .azure/cuda/cuda-benchmark-fp32-big.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ jobs:
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
python ./setup.py build -g -b build -t build bdist_wheel -d build/dist
python ./setup.py build -b build -t build bdist_wheel -d build/dist
displayName: Python build
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --input gemm_big_A6000.csv,gemm_big.csv --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --check True
python gpu_benchmark_tool.py --input gemm_big_A6000.csv gemm_big.csv --type s --batch_size 1 --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE
displayName: Run fp32 benchmarks BIG A6000
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand Down
6 changes: 3 additions & 3 deletions .azure/cuda/cuda-benchmark-fp32-resnet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ trigger: none

jobs:
- job: "CUDA_Benchmarking_FP32_RESNET"
timeoutInMinutes: 1080
timeoutInMinutes: 2160

pool:
name: LinuxNVGPUPool
Expand Down Expand Up @@ -42,13 +42,13 @@ jobs:
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
python ./setup.py build -g -b build -t build bdist_wheel -d build/dist
python ./setup.py build -b build -t build bdist_wheel -d build/dist
displayName: Python build
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --input gemm_resnet_inception.csv --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --check True
python gpu_benchmark_tool.py --input gemm_resnet_inception.csv --type s --batch_size 1 --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --verbose --check
displayName: Run fp32 benchmarks RESNET
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand Down
4 changes: 2 additions & 2 deletions .azure/cuda/cuda-benchmark-fp32.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ jobs:
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
python ./setup.py build -g -b build -t build bdist_wheel -d build/dist
python ./setup.py build -b build -t build bdist_wheel -d build/dist
displayName: Python build
workingDirectory: "$(Build.SourcesDirectory)"
- bash: |
export PYTHONPATH=$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8
python gpu_benchmark_tool.py --input gemm_small_A6000.csv,gemm_small.csv --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --janitor True --verbose True --check True
python gpu_benchmark_tool.py --input gemm_small_A6000.csv gemm_small.csv --type s --target 'NVidia RTX A6000' --branch $(Build.SourceBranch) --output $(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8/accera_benchmarks/results --upload official_build_container_DO_NOT_UPLOAD_HERE --verbose --check
displayName: Run fp32 benchmarks A6000
workingDirectory: "$(Build.SourcesDirectory)/tools/benchmarkers"
env:
Expand Down
4 changes: 2 additions & 2 deletions .azure/cuda/cuda-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,14 @@ steps:
echo "CUDA_VISIBLE_DEVICES" ${CUDA_VISIBLE_DEVICES}
export LLVM_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-12
python -m pip install bfloat16
python -m pytest -v --junitxml=test/test-mfma_tests.xml accera/test/mfma_tests.py
python -m pytest -s -v --junitxml=test/test-mfma_tests.xml accera/test/mfma_tests.py
displayName: Run MFMA tests
workingDirectory: "$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8"
- bash: |
export CUDA_VISIBLE_DEVICES=$(CUDA_VISIBLE_DEVICES)
export LLVM_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-12
python -m pytest -v --junitxml=test/test-smoke_tests.xml accera/test/smoke_tests.py -k "cuda"
python -m pytest -s -v --junitxml=test/test-smoke_tests.xml accera/test/smoke_tests.py
displayName: Run CUDA smoke tests
workingDirectory: "$(Build.SourcesDirectory)/build/lib.linux-x86_64-3.8"
Expand Down
35 changes: 25 additions & 10 deletions .azure/linux-accera.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,40 @@ strategy:
Python310:
Python.Version: "3.10"

variables:
- name: PARALLEL
value: 4 # 2 cores (https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml#hardware)
- name: PIP_CACHE_DIR
value: $(Pipeline.Workspace)/.pip
- name: VCPKG_BINARY_SOURCES
value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"

steps:
- task: NuGetAuthenticate@0

- task: UsePythonVersion@0
inputs:
versionSpec: $(Python.Version)
addToPath: true
architecture: "x64"

- task: Cache@2
inputs:
key: 'pip | "$(Agent.OS)" | $(Build.SourcesDirectory)/requirements.txt'
restoreKeys: |
pip | "$(Agent.OS)"
path: $(PIP_CACHE_DIR)
displayName: Cache pip

- bash: |
sudo apt-get install libunwind-dev ninja-build ccache python3-pip libvulkan-dev libomp-11-dev pkg-config -y
sudo sysctl -w kernel.core_pattern="$(Build.SourcesDirectory)/build/core-%e-%s-%u-%g-%p-%t.dump"
ulimit -c unlimited
python -m pip install -U pip
python -m pip install -r $(Build.SourcesDirectory)/requirements.txt
echo "mkdir $HOME/.ccache"
mkdir $HOME/.ccache
echo "ln -s $HOME/.ccache $(System.DefaultWorkingDirectory)/ccache"
ln -s $HOME/.ccache $(System.DefaultWorkingDirectory)/ccache
conan remote add accera $(CONAN_REMOTE)
conan user -p $(CONAN_PWD) -r accera $(CONAN_USERNAME)
echo "##vso[task.prependpath]/usr/lib/ccache"
displayName: Install prereqs for Linux
env:
CONAN_PWD: $(CONAN_PWD)
Expand All @@ -53,35 +68,35 @@ steps:
# Note: Code signing is not available for Linux distributions (outside of packages.microsoft.com)
- task: PythonScript@0
displayName: python ./setup.py build bdist_wheel -d $(Build.SourcesDirectory)/build/dist
displayName: python ./setup.py build_ext -j $(PARALLEL) build bdist_wheel -d $(Build.SourcesDirectory)/build/dist
inputs:
scriptSource: "filePath"
scriptPath: "$(Build.SourcesDirectory)/setup.py"
arguments: "build bdist_wheel -d $(Build.SourcesDirectory)/build/dist"
arguments: "build_ext -j $(PARALLEL) build bdist_wheel -d $(Build.SourcesDirectory)/build/dist"
workingDirectory: "$(Build.SourcesDirectory)/"

- task: PythonScript@0
displayName: compilers python ./setup.py build bdist_wheel -d $(Build.SourcesDirectory)/build/dist
displayName: compilers python ./setup.py build_ext -j $(PARALLEL) build bdist_wheel -d $(Build.SourcesDirectory)/build/dist
inputs:
scriptSource: "filePath"
scriptPath: "$(Build.SourcesDirectory)/accera/python/compilers/setup.py"
arguments: "build bdist_wheel -d $(Build.SourcesDirectory)/build/dist"
arguments: "build_ext -j $(PARALLEL) build bdist_wheel -d $(Build.SourcesDirectory)/build/dist"
workingDirectory: "$(Build.SourcesDirectory)/accera/python/compilers"

- task: PythonScript@0
displayName: gpu python ./setup.py build bdist_wheel -d $(Build.SourcesDirectory)/build/dist
inputs:
scriptSource: "filePath"
scriptPath: "$(Build.SourcesDirectory)/accera/python/gpu/setup.py"
arguments: "build bdist_wheel -d $(Build.SourcesDirectory)/build/dist"
arguments: "build_ext -j $(PARALLEL) build bdist_wheel -d $(Build.SourcesDirectory)/build/dist"
workingDirectory: "$(Build.SourcesDirectory)/accera/python/gpu"

- task: PythonScript@0
displayName: llvm python ./setup.py build bdist_wheel -d $(Build.SourcesDirectory)/build/dist
inputs:
scriptSource: "filePath"
scriptPath: "$(Build.SourcesDirectory)/accera/python/llvm/setup.py"
arguments: "build bdist_wheel -d $(Build.SourcesDirectory)/build/dist"
arguments: "build_ext -j $(PARALLEL) build bdist_wheel -d $(Build.SourcesDirectory)/build/dist"
workingDirectory: "$(Build.SourcesDirectory)/accera/python/llvm"

- bash: |
Expand Down
Loading

0 comments on commit a9ab6bd

Please sign in to comment.