Squashed commit of the following:

commit dc15c16dec73ef9316bde789f7cfa90776ab1340 Author: Denny Sun <dennys@microsoft.com> Date: Fri Sep 16 21:18:45 2022 +0000 Merged PR 2862: write runtime size of index type to Hat write runtime size of index type to Hat commit 91dc8f6458ed605f7a8a940a4e9d58b89d741a46 Author: Ritwik Das <ritdas@microsoft.com> Date: Fri Sep 16 08:26:44 2022 +0000 Merged PR 2861: Fix cache_C benchmark variable which is not getting set properly for CUDA Fix cache_C benchmark variable which is not getting set properly for CUDA commit 4bab0884b0ce53ea03a7cba927c7169ce3c22e1e Author: Lisa Ong <onglisa@microsoft.com> Date: Fri Sep 16 06:15:39 2022 +0000 Merged PR 2864: [build]: fix breaks due to agent image updates Latest version of azure pipelines images now set VCPKG_ROOT, which overrides the submodule used by Accera. See: actions/runner-images@ef638dd * Only pipelines that rely on azure build agents are affected. * We still need to keep the submodule around to enable external builds from the Github repo. * Remove defunct pipeline * Update vcpkg submodule while we're here commit fe40e67394232de4524a80b15d8a31731398e9f6 Author: Ritwik Das <ritdas@microsoft.com> Date: Sat Sep 10 04:45:25 2022 +0000 Merged PR 2839: Enable CUDA output caching - Add Tensor memory space type to denote memory fragments for caching (e.g. C in gemm). this might go away in future and just be replaced with Private once caching code is unified with ROCM behavior. - Change caching code to generate MMALoad/StoreOps for caching of the output. Related work items: #3725 commit 0fd36d5c7897eb2b3814a396626a56ec822cec0c Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Wed Sep 7 20:03:30 2022 +0000 Merged PR 2813: Add pass to recognize patterns that look like int16 matrix multiply This PR adds a pass to rewrite GEMM-like loops that multiply-accumulate int16 matrices into an int32 result. If this pattern gets invoked, the output should contain the much-sought `vpmaddwd` instruction. It also fixes some old low-level tests of integer arithmetic. commit fea6475e4b54b4036bd6f160552723b1e9f16662 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Sep 6 23:56:56 2022 +0000 Merged PR 2847: [release] Bump docs version to 1.2.9 and update github action container * Rev docs to 1.2.9 * Update github workflow to reference updated tag for 14.0.6-1 commit 369b6bd532d5fca545991750f1be708db0d992df Author: Ritwik Das <ritdas@microsoft.com> Date: Thu Sep 1 20:00:32 2022 +0000 Merged PR 2845: Filter GPU benchmarks by de-parameterizing cache layouts Filter GPU benchmarks by de-parameterizing cache layouts commit 0d96e70ec9433c9fac7b1b2490cdf05b18fd5a86 Author: Ritwik Das <ritdas@microsoft.com> Date: Wed Aug 31 02:00:28 2022 +0000 Merged PR 2843: Fix bug in GPU benchmark to calculate valid variant - Fix bug in GPU benchmark to calculate valid variant - Add cosmosdb util to cleanup old entries commit 2b0af04f30082ac4fdd2b5922024c6eeb1faeafa Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Aug 31 01:14:28 2022 +0000 Merged PR 2835: Merge in MLIR fixes for LocationSnapshot and MemRefCastOp From 1abc4a981067ef1fd9bf717d7fabc4f6d75520d1 Mon Sep 17 00:00:00 2001 From: Chuck Jacobs <cjacobs@microsoft.com> Date: Wed, 24 Aug 2022 04:14:51 +0000 Subject: [PATCH] Merged PR 2822: Fix lowering of `MemrefCastOp` to the LLVM dialect From 39f0a4c97f5c89d7fa815118a3230091172bc795 Mon Sep 17 00:00:00 2001 From: Charles Jacobs <cjacobs@microsoft.com> Date: Mon, 15 Aug 2022 16:00:43 -0700 Subject: [PATCH] Fix issue where passed-in op-printing flags were ignored commit f517044a330e8c45df9f9b63410cf1a57dacf4f8 Author: Ritwik Das <ritdas@microsoft.com> Date: Tue Aug 30 04:16:33 2022 +0000 Merged PR 2842: Paramterize cache strategy in GPU benchmarks and fix kernel filters Paramterize cache strategy in GPU benchmarks and fix kernel filters commit b0dc38572b66792a24e7ed8136c270bc4979fafb Author: Lisa Ong <onglisa@microsoft.com> Date: Mon Aug 29 18:19:41 2022 +0000 Merged PR 2836: Value DSL support for runtime sized output arrays * This adds memref-in-memref support for output arrays that are allocated in the function * A new "Pointer" Value wrapper class with a Store() function which creates an accv.StoreOp, similar to Array, Scalar * Update accv.StoreOp to support memrefs-in-memrefs Value pointer levels are defined as follows: |Layout|Example|Pointer level|C-type| |--|--|--|--| |scalar|int16, float32, index, ...|0|int16_t, float32_t, int64_t, ...| |single-level memref|memref<1xindex>, memref<3x2xf32>, memref<10x16x11x?xf32>|1|int64_t*, float32_t*, float32_t*| |memref in memref|memref<memref<?x?x?f32>>|at least 2 (= the number of levels of memrefs)|float32_t**| Future work: * End-to-end lowering through Python DSL * Bare pointer convention for output arrays * Custom allocator functions. Currently we use the built-in std alloc. Related work items: #3730 commit 2748bd94e98333d13401786b8f9153537ad8a89b Author: Lisa Ong <onglisa@microsoft.com> Date: Mon Aug 29 16:23:09 2022 +0000 Merged PR 2840: [nfc] Remove redundant ACR info from docker scripts The container registry allows pull-only access commit b7ce8ed4932fe613a4770df9ba00b2b7a8ad3d09 Author: Denny Sun <dennys@microsoft.com> Date: Fri Aug 26 21:07:34 2022 +0000 Merged PR 2838: Runtime sized Array lowering to LLVM, accv.alloc to LLVM malloc 1. make deep copy of range end of value type when cloning ops 2. plumbing runtime size to LLVM 3. transform memref.alloc to LLVM malloc 4. conversion between block argument and symbol name the generated IRs: **Initial.mlir** `%2 = "accv.alloc"(%arg0, %arg1) {sym_name = "diff"} : (index, index) -> memref<?x?xf32> loc(#loc)` **LoopNestToValueFunc.mlir** ``` %2 = "accv.alloc"(%arg0, %arg1) {sym_name = "diff"} : (index, index) -> memref<?x?xf32> loc(#loc) affine.for %arg4 = 0 to %arg0 { affine.for %arg5 = 0 to %arg1 { } } ``` **ConvertValueToStd.mlir** `%0 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>` **ConvertValueToLLVM.mlir** ``` %8 = llvm.mul %arg1, %arg0 : i64 %9 = llvm.mlir.null : !llvm.ptr<f32> %10 = llvm.getelementptr %9[%8] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32> %11 = llvm.ptrtoint %10 : !llvm.ptr<f32> to i64 %12 = llvm.call @malloc(%11) : (i64) -> !llvm.ptr<i8> ``` Related work items: #3733 commit dbbce7261e0f8c83a3cb56b5607662ebe2faa651 Author: Mason Remy <masonr@microsoft.com> Date: Wed Aug 24 22:00:02 2022 +0000 Merged PR 2831: Record unique IDs so that different processes acting on a value module Record unique IDs so that different processes acting on a value module don't produce conflicting IDs commit 0d168c99f3049f4f83a11883d573087785bec23b Author: Ritwik Das <ritdas@microsoft.com> Date: Wed Aug 24 17:47:50 2022 +0000 Merged PR 2837: Fix WPT calculation to prevent 0 work and filter benchmarks Fix WPT calculation to prevent 0 work and filter benchmarks commit 830bd6cd2d8d621ffcb2b13282de3ccc7929224d Author: Ritwik Das <ritdas@microsoft.com> Date: Tue Aug 23 20:39:49 2022 +0000 Merged PR 2832: Caching strategy flag and thread ID optimization (GPU) - Add a flag to plan.cache() to expose the different thread <--> data arrangements - Optimize thread ID calculation to check blockdim first commit cc2c38e9818f4b6ffa6ea4218a2ab483c4a8a574 Author: Ritwik Das <ritdas@microsoft.com> Date: Mon Aug 22 15:58:00 2022 +0000 Merged PR 2829: Add handwritten caching implementation for GPU Add GPUBlockCacheOp which lowers to handwritted caching implementation on the GPU which supports access patterns for minimizing bank conflicts in shared memory and maximizing coalescing global memory access. commit 432621157bd2ebeb0344fa36cb3821becf06fc3c Author: Kern Handa <kerha@microsoft.com> Date: Fri Aug 19 00:22:00 2022 +0000 Merged PR 2821: Fixes constraint logic for fusion of more than two schedules Fixes constraint logic for fusion of more than two schedules commit 16f31231bb4159835d68a90fec43f49eda2aa983 Author: Kern Handa <kerha@microsoft.com> Date: Thu Aug 18 23:53:44 2022 +0000 Merged PR 2830: Fixes macOS CI build Fixes macOS CI build commit 786744fd2ae7c0f2d3e58f49edb561cadeb72d80 Author: Mason Remy <masonr@microsoft.com> Date: Fri Aug 12 23:50:04 2022 +0000 Merged PR 2806: Enable specifying cache element type Enable specifying cache element type - Supports accumulating and/or computing in a different element type and batching up the casts for those types - Also adds support for binop/castop expansion and castop folding commit a777fb5c59d6215ff707e8ee536f25a4b9f78641 Author: Ritwik Das <ritdas@microsoft.com> Date: Thu Aug 11 22:03:15 2022 +0000 Merged PR 2818: Upgrade hatlib dependency to v0.0.23 Upgrade hatlib dependency to v0.0.23 commit ee15139a7f362807af10c42d09273cd0719def4a Author: Mason Remy <masonr@microsoft.com> Date: Thu Aug 11 06:05:50 2022 +0000 Merged PR 2792: Refactor cast to a value cast op Refactor cast to a value cast op commit 138d2abfac4c10641d6ff6a486cdc5c4d4b0fd38 Author: Chuck Jacobs <cjacobs@microsoft.com> Date: Thu Aug 11 01:59:47 2022 +0000 Merged PR 2788: Re-enabled fusing test that was taking too long This PR just re-enables a skipped test that was taking too long commit fc70a56aa4982912cda7a2dff7888f21f23aaeaa Author: Ritwik Das <ritdas@microsoft.com> Date: Thu Aug 11 00:58:24 2022 +0000 Merged PR 2816: Upgrade hatlib requirement to 0.0.22 Upgrade hatlib requirement to 0.0.22 commit 12fea07a0d1bc8b3ba1c5c5543de5ea9b3ae5a20 Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Aug 10 04:30:48 2022 +0000 Merged PR 2811: [nfc] Upgrade CUDA to 11.7 on NVidia benchmark machines According to https://hub.docker.com/r/nvidia/cuda/tags, 11.7.0 is still the latest.
microsoft · Sep 17, 2022 · adda009 · adda009
1 parent a9ab6bd
commit adda009
Show file tree

Hide file tree

Showing 201 changed files with 5,838 additions and 2,152 deletions.
diff --git a/.azure/cuda/Dockerfile b/.azure/cuda/Dockerfile
@@ -5,7 +5,7 @@
 #  docker build -f .azure/cuda/Dockerfile . -t registry_name/cuda-linuxagent:latest
 ####################################################################################################
 
-ARG CUDAVER=11.6.2-devel-ubuntu20.04
+ARG CUDAVER=11.7.0-devel-ubuntu20.04
 
 # cf: nvidia/cuda:${CUDAVER}
 FROM acceracontainers.azurecr.io/nvidia/cuda:${CUDAVER}

diff --git a/.azure/cuda/README.md b/.azure/cuda/README.md
@@ -27,18 +27,16 @@ After building, you can manually push the container to a Docker repository if ne
 On a Linux machine with a CUDA GPU:
 
 ```shell
-export AZP_URL=<ADO org-level server url>
-export AZP_TOKEN=<ADO server PAT>
-export ACR_USER=<ACR client id>
-export ACR_SECRET=<ACR client secret>
+export AZP_URL=<ADO_URL>
+export AZP_TOKEN=<ADO_PAT>
+export ACR_REPO=<ACR_REPO>
 bash run_agent.sh
 ```
 
 Where:
-- <PAT> - Personal access token with "Agent Pools (read, manage)" scope.
+- <ADO_PAT> - Personal access token with "Agent Pools (read, manage)" scope.
 - <ADO_URL> - Server URL for the Azure DevOps instance. Note that this is the organization-level URL, *not* the project-level URL. This is likely because ADO agents and pools can be organization-scoped.
-- <ACR_USER> - Client id for the service principal allowing pull access to the Azure container registry
-- <ACR_SECRET> - Client secret for the service principal allowing pull access to the Azure container registry
+- <ACR_REPO> - Azure Container Registry repository
 
 ## Debugging
 

diff --git a/.azure/cuda/build_agent.sh b/.azure/cuda/build_agent.sh
@@ -5,7 +5,7 @@
 ####################################################################################################
 set -x -e
 
-CUDAVER=11.6.2-devel-ubuntu20.04
+CUDAVER=11.7.0-devel-ubuntu20.04
 
 SCRIPT_DIR=$(dirname $(readlink -f "$0"))
 ACCERA_ROOT=${SCRIPT_DIR}/../../

diff --git a/.azure/cuda/cuda-benchmark-fp32.yml b/.azure/cuda/cuda-benchmark-fp32.yml
@@ -9,7 +9,7 @@ trigger: none
 
 jobs:
   - job: "CUDA_Benchmarking_FP32"
-    timeoutInMinutes: 480
+    timeoutInMinutes: 540
 
     pool:
       name: LinuxNVGPUPool

diff --git a/.azure/cuda/run_agent.sh b/.azure/cuda/run_agent.sh
@@ -7,23 +7,21 @@
 ####################################################################################################
 set -x -e
 
-VARS=(AZP_URL AZP_TOKEN ACR_REPO ACR_USER ACR_SECRET)
+VARS=(AZP_URL AZP_TOKEN ACR_REPO)
 for var in "${VARS[@]}"; do
     if [[ (-z "${!var}") ]]; then
         echo "${var} is not set"
         exit
     fi
 done
 
-CUDAVER=11.6.2-devel-ubuntu20.04
+CUDAVER=11.7.0-devel-ubuntu20.04
 IMAGE=${ACR_REPO}/cuda-linuxagent:${CUDAVER}
 POOL=LinuxNVGPUPool
 
 SCRIPT_DIR=$(dirname $(readlink -f "$0"))
 ACCERA_ROOT=${SCRIPT_DIR}/../../
 
-sudo docker login -u ${ACR_USER} -p ${ACR_SECRET} ${ACR_REPO}
-
 #
 # Debugging Example:
 #

diff --git a/.azure/linux-accera.yml b/.azure/linux-accera.yml
@@ -22,6 +22,8 @@ variables:
    value: $(Pipeline.Workspace)/.pip
  - name: VCPKG_BINARY_SOURCES
    value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"
+ - name: VCPKG_ROOT
+   value: "$(Build.SourcesDirectory)/external/vcpkg"
 
 steps:
   - task: NuGetAuthenticate@0

diff --git a/.azure/linux-pr.yml b/.azure/linux-pr.yml
@@ -10,6 +10,8 @@ variables:
    value: $(Pipeline.Workspace)/.pip
  - name: VCPKG_BINARY_SOURCES
    value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"
+ - name: VCPKG_ROOT
+   value: "$(Build.SourcesDirectory)/external/vcpkg"
 
 steps:
   - task: NuGetAuthenticate@0

diff --git a/.azure/llvm-canary.yml b/.azure/llvm-canary.yml
diff --git a/.azure/macos-accera.yml b/.azure/macos-accera.yml
@@ -25,7 +25,10 @@ strategy:
       Python.Version: "3.10"
 
 variables:
-  VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
+ - name: VULKAN_CACHE_DIR
+   value: $(Pipeline.Workspace)/.vulkansdk
+ - name: VCPKG_ROOT
+   value: "$(Build.SourcesDirectory)/external/vcpkg"
 
 steps:
   - task: UsePythonVersion@0

diff --git a/.azure/macos-pr.yml b/.azure/macos-pr.yml
@@ -8,6 +8,8 @@ variables:
    value: $(Pipeline.Workspace)/.pip
  - name: VCPKG_BINARY_SOURCES
    value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"
+ - name: VCPKG_ROOT
+   value: "$(Build.SourcesDirectory)/external/vcpkg"
 
 steps:
   - task: NuGetAuthenticate@0
@@ -74,7 +76,6 @@ steps:
         ctest -C Release -T test -VV -LE benchmark --progress
     displayName: Run all ctest targets
     continueOnError: false
-    workingDirectory: "$(Build.SourcesDirectory)/build"
 
   - task: CopyFiles@2
     condition: always()

diff --git a/.azure/rocm/README.md b/.azure/rocm/README.md
@@ -15,18 +15,16 @@ After building, you can manually push the container to a Docker repository if ne
 On a Linux machine with an AMD GPU:
 
 ```shell
-export AZP_URL=<ADO org-level server url>
-export AZP_TOKEN=<ADO server PAT>
-export ACR_USER=<ACR client id>
-export ACR_SECRET=<ACR client secret>
+export AZP_URL=<ADO_URL>
+export AZP_TOKEN=<ADO_PAT>
+export ACR_REPO=<ACR_REPO>
 bash run_agent.sh
 ```
 
 Where:
-- <PAT> - Personal access token with "Agent Pools (read, manage)" scope.
+- <ADO_PAT> - Personal access token with "Agent Pools (read, manage)" scope.
 - <ADO_URL> - Server URL for the Azure DevOps instance. Note that this is the organization-level URL, *not* the project-level URL. This is likely because ADO agents and pools can be organization-scoped.
-- <ACR_USER> - Client id for the service principal allowing pull access to the Azure container registry
-- <ACR_SECRET> - Client secret for the service principal allowing pull access to the Azure container registry
+- <ACR_REPO> - Azure Container Registry repository
 
 ## Debugging
 

diff --git a/.azure/rocm/run_agent.sh b/.azure/rocm/run_agent.sh
@@ -5,7 +5,7 @@
 ####################################################################################################
 set -x -e
 
-VARS=(AZP_URL AZP_TOKEN ACR_REPO ACR_USER ACR_SECRET)
+VARS=(AZP_URL AZP_TOKEN ACR_REPO)
 for var in "${VARS[@]}"; do
     if [[ (-z "${!var}") ]]; then
         echo "${var} is not set"
@@ -20,8 +20,6 @@ POOL=LinuxAMDGPUPool
 SCRIPT_DIR=$(dirname $(readlink -f "$0"))
 ACCERA_ROOT=${SCRIPT_DIR}/../../
 
-sudo docker login -u ${ACR_USER} -p ${ACR_SECRET} ${ACR_REPO}
-
 #
 # Debugging Example:
 #

diff --git a/.azure/sdl-set1.yml b/.azure/sdl-set1.yml
@@ -11,7 +11,10 @@ pool:
   vmImage: windows-latest
 
 variables:
-  VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
+ - name: VULKAN_CACHE_DIR
+   value: $(Pipeline.Workspace)/.vulkansdk
+ - name: VCPKG_ROOT
+   value: "$(Build.SourcesDirectory)/external/vcpkg"
 
 steps:
 

diff --git a/.azure/sdl-set2.yml b/.azure/sdl-set2.yml
@@ -11,7 +11,10 @@ pool:
   vmImage: windows-latest
 
 variables:
-  VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
+ - name: VULKAN_CACHE_DIR
+   value: $(Pipeline.Workspace)/.vulkansdk
+ - name: VCPKG_ROOT
+   value: "$(Build.SourcesDirectory)/external/vcpkg"
 
 steps:
 

diff --git a/.azure/sdl-set3.yml b/.azure/sdl-set3.yml
@@ -15,8 +15,12 @@ jobs:
       vmImage: windows-latest
 
     variables:
-      VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
-      LGTM.UploadSnapshot: true
+    - name: VULKAN_CACHE_DIR
+      value: $(Pipeline.Workspace)/.vulkansdk
+    - name: LGTM.UploadSnapshot
+      value: true
+    - name: VCPKG_ROOT
+      value: "$(Build.SourcesDirectory)/external/vcpkg"
 
     steps:
     - task: UsePythonVersion@0

diff --git a/.azure/win-accera.yml b/.azure/win-accera.yml
@@ -25,7 +25,10 @@ strategy:
       Python.Version: "3.10"
 
 variables:
-  VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
+ - name: VULKAN_CACHE_DIR
+   value: $(Pipeline.Workspace)/.vulkansdk
+ - name: VCPKG_ROOT
+   value: "$(Build.SourcesDirectory)/external/vcpkg"
 
 steps:
 

diff --git a/.azure/win-pr.yml b/.azure/win-pr.yml
@@ -8,6 +8,8 @@ variables:
    value: $(Pipeline.Workspace)/.pip
  - name: VCPKG_BINARY_SOURCES
    value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"
+ - name: VCPKG_ROOT
+   value: "$(Build.SourcesDirectory)/external/vcpkg"
 
 steps:
   - task: NuGetAuthenticate@0

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -22,7 +22,7 @@ jobs:
 
     runs-on: ubuntu-latest
     container:
-        image: acceracontainers.azurecr.io/accera-llvm-ubuntu:main-llvmorg-14.0.6
+        image: acceracontainers.azurecr.io/accera-llvm-ubuntu:llvmorg-14.0.6-1
     steps:
     - uses: actions/checkout@v2
       with: