Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit dc15c16dec73ef9316bde789f7cfa90776ab1340
Author: Denny Sun <dennys@microsoft.com>
Date:   Fri Sep 16 21:18:45 2022 +0000

    Merged PR 2862: write runtime size of index type to Hat

    write runtime size of index type to Hat

commit 91dc8f6458ed605f7a8a940a4e9d58b89d741a46
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Fri Sep 16 08:26:44 2022 +0000

    Merged PR 2861: Fix cache_C benchmark variable which is not getting set properly for CUDA

    Fix cache_C benchmark variable which is not getting set properly for CUDA

commit 4bab0884b0ce53ea03a7cba927c7169ce3c22e1e
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Fri Sep 16 06:15:39 2022 +0000

    Merged PR 2864: [build]: fix breaks due to agent image updates

    Latest version of azure pipelines images now set VCPKG_ROOT, which overrides the submodule used by Accera.

    See: actions/runner-images@ef638dd

    * Only pipelines that rely on azure build agents are affected.
    * We still need to keep the submodule around to enable external builds from the Github repo.
    * Remove defunct pipeline
    * Update vcpkg submodule while we're here

commit fe40e67394232de4524a80b15d8a31731398e9f6
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Sat Sep 10 04:45:25 2022 +0000

    Merged PR 2839: Enable CUDA output caching

    - Add Tensor memory space type to denote memory fragments for caching (e.g. C in gemm). this might go away in future and just be replaced with Private once caching code is unified with ROCM behavior.
    - Change caching code to generate MMALoad/StoreOps for caching of the output.

    Related work items: #3725

commit 0fd36d5c7897eb2b3814a396626a56ec822cec0c
Author: Chuck Jacobs <cjacobs@microsoft.com>
Date:   Wed Sep 7 20:03:30 2022 +0000

    Merged PR 2813: Add pass to recognize patterns that look like int16 matrix multiply

    This PR adds a pass to rewrite GEMM-like loops that multiply-accumulate int16 matrices into an int32 result. If this pattern gets invoked, the output should contain the much-sought `vpmaddwd` instruction.

    It also fixes some old low-level tests of integer arithmetic.

commit fea6475e4b54b4036bd6f160552723b1e9f16662
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Sep 6 23:56:56 2022 +0000

    Merged PR 2847: [release] Bump docs version to 1.2.9 and update github action container

    * Rev docs to 1.2.9

    * Update github workflow to reference updated tag for 14.0.6-1

commit 369b6bd532d5fca545991750f1be708db0d992df
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Thu Sep 1 20:00:32 2022 +0000

    Merged PR 2845: Filter GPU benchmarks by de-parameterizing cache layouts

    Filter GPU benchmarks by de-parameterizing cache layouts

commit 0d96e70ec9433c9fac7b1b2490cdf05b18fd5a86
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Wed Aug 31 02:00:28 2022 +0000

    Merged PR 2843: Fix bug in GPU benchmark to calculate valid variant

    - Fix bug in GPU benchmark to calculate valid variant
    - Add cosmosdb util to cleanup old entries

commit 2b0af04f30082ac4fdd2b5922024c6eeb1faeafa
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Wed Aug 31 01:14:28 2022 +0000

    Merged PR 2835: Merge in MLIR fixes for LocationSnapshot and MemRefCastOp

    From 1abc4a981067ef1fd9bf717d7fabc4f6d75520d1 Mon Sep 17 00:00:00 2001
    From: Chuck Jacobs <cjacobs@microsoft.com>
    Date: Wed, 24 Aug 2022 04:14:51 +0000
    Subject: [PATCH] Merged PR 2822: Fix lowering of `MemrefCastOp` to the LLVM dialect

    From 39f0a4c97f5c89d7fa815118a3230091172bc795 Mon Sep 17 00:00:00 2001
    From: Charles Jacobs <cjacobs@microsoft.com>
    Date: Mon, 15 Aug 2022 16:00:43 -0700
    Subject: [PATCH] Fix issue where passed-in op-printing flags were ignored

commit f517044a330e8c45df9f9b63410cf1a57dacf4f8
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Tue Aug 30 04:16:33 2022 +0000

    Merged PR 2842: Paramterize cache strategy in GPU benchmarks and fix kernel filters

    Paramterize cache strategy in GPU benchmarks and fix kernel filters

commit b0dc38572b66792a24e7ed8136c270bc4979fafb
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Mon Aug 29 18:19:41 2022 +0000

    Merged PR 2836: Value DSL support for runtime sized output arrays

    * This adds memref-in-memref support for output arrays that are allocated in the function
    * A new "Pointer" Value wrapper class with a Store() function which creates an accv.StoreOp, similar to Array, Scalar
    * Update accv.StoreOp to support memrefs-in-memrefs

    Value pointer levels are defined as follows:

    |Layout|Example|Pointer level|C-type|
    |--|--|--|--|
    |scalar|int16, float32, index, ...|0|int16_t, float32_t, int64_t, ...|
    |single-level memref|memref<1xindex>, memref<3x2xf32>, memref<10x16x11x?xf32>|1|int64_t*, float32_t*, float32_t*|
    |memref in memref|memref<memref<?x?x?f32>>|at least 2 (= the number of levels of memrefs)|float32_t**|

    Future work:
    * End-to-end lowering through Python DSL
    * Bare pointer convention for output arrays
    * Custom allocator functions. Currently we use the built-in std alloc.

    Related work items: #3730

commit 2748bd94e98333d13401786b8f9153537ad8a89b
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Mon Aug 29 16:23:09 2022 +0000

    Merged PR 2840: [nfc] Remove redundant ACR info from docker scripts

    The container registry allows pull-only access

commit b7ce8ed4932fe613a4770df9ba00b2b7a8ad3d09
Author: Denny Sun <dennys@microsoft.com>
Date:   Fri Aug 26 21:07:34 2022 +0000

    Merged PR 2838: Runtime sized Array lowering to LLVM, accv.alloc to LLVM malloc

    1. make deep copy of range end of value type when cloning ops
    2. plumbing runtime size to LLVM
    3. transform memref.alloc to LLVM malloc
    4. conversion between block argument and symbol name

    the generated IRs:

    **Initial.mlir**

    `%2 = "accv.alloc"(%arg0, %arg1) {sym_name = "diff"} : (index, index) -> memref<?x?xf32> loc(#loc)`

    **LoopNestToValueFunc.mlir**

    ```
    %2 = "accv.alloc"(%arg0, %arg1) {sym_name = "diff"} : (index, index) -> memref<?x?xf32> loc(#loc)
    affine.for %arg4 = 0 to %arg0 {
        affine.for %arg5 = 0 to %arg1 {
        }
    }
    ```

    **ConvertValueToStd.mlir**

        `%0 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>`

    **ConvertValueToLLVM.mlir**

    ```
    %8 = llvm.mul %arg1, %arg0  : i64
    %9 = llvm.mlir.null : !llvm.ptr<f32>
    %10 = llvm.getelementptr %9[%8] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
    %11 = llvm.ptrtoint %10 : !llvm.ptr<f32> to i64
    %12 = llvm.call @malloc(%11) : (i64) -> !llvm.ptr<i8>
    ```

    Related work items: #3733

commit dbbce7261e0f8c83a3cb56b5607662ebe2faa651
Author: Mason Remy <masonr@microsoft.com>
Date:   Wed Aug 24 22:00:02 2022 +0000

    Merged PR 2831: Record unique IDs so that different processes acting on a value module

    Record unique IDs so that different processes acting on a value module
    don't produce conflicting IDs

commit 0d168c99f3049f4f83a11883d573087785bec23b
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Wed Aug 24 17:47:50 2022 +0000

    Merged PR 2837: Fix WPT calculation to prevent 0 work and filter benchmarks

    Fix WPT calculation to prevent 0 work and filter benchmarks

commit 830bd6cd2d8d621ffcb2b13282de3ccc7929224d
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Tue Aug 23 20:39:49 2022 +0000

    Merged PR 2832: Caching strategy flag and thread ID optimization (GPU)

    - Add a flag to plan.cache() to expose the different thread <--> data arrangements
    - Optimize thread ID calculation to check blockdim first

commit cc2c38e9818f4b6ffa6ea4218a2ab483c4a8a574
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Mon Aug 22 15:58:00 2022 +0000

    Merged PR 2829: Add handwritten caching implementation for GPU

    Add GPUBlockCacheOp which lowers to handwritted caching implementation on the GPU which supports access patterns for minimizing bank conflicts in shared memory and maximizing coalescing global memory access.

commit 432621157bd2ebeb0344fa36cb3821becf06fc3c
Author: Kern Handa <kerha@microsoft.com>
Date:   Fri Aug 19 00:22:00 2022 +0000

    Merged PR 2821: Fixes constraint logic for fusion of more than two schedules

    Fixes constraint logic for fusion of more than two schedules

commit 16f31231bb4159835d68a90fec43f49eda2aa983
Author: Kern Handa <kerha@microsoft.com>
Date:   Thu Aug 18 23:53:44 2022 +0000

    Merged PR 2830: Fixes macOS CI build

    Fixes macOS CI build

commit 786744fd2ae7c0f2d3e58f49edb561cadeb72d80
Author: Mason Remy <masonr@microsoft.com>
Date:   Fri Aug 12 23:50:04 2022 +0000

    Merged PR 2806: Enable specifying cache element type

    Enable specifying cache element type

    - Supports accumulating and/or computing in a different element type and
      batching up the casts for those types
    - Also adds support for binop/castop expansion and castop folding

commit a777fb5c59d6215ff707e8ee536f25a4b9f78641
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Thu Aug 11 22:03:15 2022 +0000

    Merged PR 2818: Upgrade hatlib dependency to v0.0.23

    Upgrade hatlib dependency to v0.0.23

commit ee15139a7f362807af10c42d09273cd0719def4a
Author: Mason Remy <masonr@microsoft.com>
Date:   Thu Aug 11 06:05:50 2022 +0000

    Merged PR 2792: Refactor cast to a value cast op

    Refactor cast to a value cast op

commit 138d2abfac4c10641d6ff6a486cdc5c4d4b0fd38
Author: Chuck Jacobs <cjacobs@microsoft.com>
Date:   Thu Aug 11 01:59:47 2022 +0000

    Merged PR 2788: Re-enabled fusing test that was taking too long

    This PR just re-enables a skipped test that was taking too long

commit fc70a56aa4982912cda7a2dff7888f21f23aaeaa
Author: Ritwik Das <ritdas@microsoft.com>
Date:   Thu Aug 11 00:58:24 2022 +0000

    Merged PR 2816: Upgrade hatlib requirement to 0.0.22

    Upgrade hatlib requirement to 0.0.22

commit 12fea07a0d1bc8b3ba1c5c5543de5ea9b3ae5a20
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Wed Aug 10 04:30:48 2022 +0000

    Merged PR 2811: [nfc] Upgrade CUDA to 11.7 on NVidia benchmark machines

    According to https://hub.docker.com/r/nvidia/cuda/tags, 11.7.0 is still the latest.
  • Loading branch information
masonremy committed Sep 17, 2022
1 parent a9ab6bd commit adda009
Show file tree
Hide file tree
Showing 201 changed files with 5,838 additions and 2,152 deletions.
2 changes: 1 addition & 1 deletion .azure/cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# docker build -f .azure/cuda/Dockerfile . -t registry_name/cuda-linuxagent:latest
####################################################################################################

ARG CUDAVER=11.6.2-devel-ubuntu20.04
ARG CUDAVER=11.7.0-devel-ubuntu20.04

# cf: nvidia/cuda:${CUDAVER}
FROM acceracontainers.azurecr.io/nvidia/cuda:${CUDAVER}
Expand Down
12 changes: 5 additions & 7 deletions .azure/cuda/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,16 @@ After building, you can manually push the container to a Docker repository if ne
On a Linux machine with a CUDA GPU:

```shell
export AZP_URL=<ADO org-level server url>
export AZP_TOKEN=<ADO server PAT>
export ACR_USER=<ACR client id>
export ACR_SECRET=<ACR client secret>
export AZP_URL=<ADO_URL>
export AZP_TOKEN=<ADO_PAT>
export ACR_REPO=<ACR_REPO>
bash run_agent.sh
```

Where:
- <PAT> - Personal access token with "Agent Pools (read, manage)" scope.
- <ADO_PAT> - Personal access token with "Agent Pools (read, manage)" scope.
- <ADO_URL> - Server URL for the Azure DevOps instance. Note that this is the organization-level URL, *not* the project-level URL. This is likely because ADO agents and pools can be organization-scoped.
- <ACR_USER> - Client id for the service principal allowing pull access to the Azure container registry
- <ACR_SECRET> - Client secret for the service principal allowing pull access to the Azure container registry
- <ACR_REPO> - Azure Container Registry repository

## Debugging

Expand Down
2 changes: 1 addition & 1 deletion .azure/cuda/build_agent.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
####################################################################################################
set -x -e

CUDAVER=11.6.2-devel-ubuntu20.04
CUDAVER=11.7.0-devel-ubuntu20.04

SCRIPT_DIR=$(dirname $(readlink -f "$0"))
ACCERA_ROOT=${SCRIPT_DIR}/../../
Expand Down
2 changes: 1 addition & 1 deletion .azure/cuda/cuda-benchmark-fp32.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ trigger: none

jobs:
- job: "CUDA_Benchmarking_FP32"
timeoutInMinutes: 480
timeoutInMinutes: 540

pool:
name: LinuxNVGPUPool
Expand Down
6 changes: 2 additions & 4 deletions .azure/cuda/run_agent.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,21 @@
####################################################################################################
set -x -e

VARS=(AZP_URL AZP_TOKEN ACR_REPO ACR_USER ACR_SECRET)
VARS=(AZP_URL AZP_TOKEN ACR_REPO)
for var in "${VARS[@]}"; do
if [[ (-z "${!var}") ]]; then
echo "${var} is not set"
exit
fi
done

CUDAVER=11.6.2-devel-ubuntu20.04
CUDAVER=11.7.0-devel-ubuntu20.04
IMAGE=${ACR_REPO}/cuda-linuxagent:${CUDAVER}
POOL=LinuxNVGPUPool

SCRIPT_DIR=$(dirname $(readlink -f "$0"))
ACCERA_ROOT=${SCRIPT_DIR}/../../

sudo docker login -u ${ACR_USER} -p ${ACR_SECRET} ${ACR_REPO}

#
# Debugging Example:
#
Expand Down
2 changes: 2 additions & 0 deletions .azure/linux-accera.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ variables:
value: $(Pipeline.Workspace)/.pip
- name: VCPKG_BINARY_SOURCES
value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:
- task: NuGetAuthenticate@0
Expand Down
2 changes: 2 additions & 0 deletions .azure/linux-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ variables:
value: $(Pipeline.Workspace)/.pip
- name: VCPKG_BINARY_SOURCES
value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:
- task: NuGetAuthenticate@0
Expand Down
67 changes: 0 additions & 67 deletions .azure/llvm-canary.yml

This file was deleted.

5 changes: 4 additions & 1 deletion .azure/macos-accera.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,10 @@ strategy:
Python.Version: "3.10"

variables:
VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
- name: VULKAN_CACHE_DIR
value: $(Pipeline.Workspace)/.vulkansdk
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:
- task: UsePythonVersion@0
Expand Down
3 changes: 2 additions & 1 deletion .azure/macos-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ variables:
value: $(Pipeline.Workspace)/.pip
- name: VCPKG_BINARY_SOURCES
value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:
- task: NuGetAuthenticate@0
Expand Down Expand Up @@ -74,7 +76,6 @@ steps:
ctest -C Release -T test -VV -LE benchmark --progress
displayName: Run all ctest targets
continueOnError: false
workingDirectory: "$(Build.SourcesDirectory)/build"
- task: CopyFiles@2
condition: always()
Expand Down
12 changes: 5 additions & 7 deletions .azure/rocm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,16 @@ After building, you can manually push the container to a Docker repository if ne
On a Linux machine with an AMD GPU:

```shell
export AZP_URL=<ADO org-level server url>
export AZP_TOKEN=<ADO server PAT>
export ACR_USER=<ACR client id>
export ACR_SECRET=<ACR client secret>
export AZP_URL=<ADO_URL>
export AZP_TOKEN=<ADO_PAT>
export ACR_REPO=<ACR_REPO>
bash run_agent.sh
```

Where:
- <PAT> - Personal access token with "Agent Pools (read, manage)" scope.
- <ADO_PAT> - Personal access token with "Agent Pools (read, manage)" scope.
- <ADO_URL> - Server URL for the Azure DevOps instance. Note that this is the organization-level URL, *not* the project-level URL. This is likely because ADO agents and pools can be organization-scoped.
- <ACR_USER> - Client id for the service principal allowing pull access to the Azure container registry
- <ACR_SECRET> - Client secret for the service principal allowing pull access to the Azure container registry
- <ACR_REPO> - Azure Container Registry repository

## Debugging

Expand Down
4 changes: 1 addition & 3 deletions .azure/rocm/run_agent.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
####################################################################################################
set -x -e

VARS=(AZP_URL AZP_TOKEN ACR_REPO ACR_USER ACR_SECRET)
VARS=(AZP_URL AZP_TOKEN ACR_REPO)
for var in "${VARS[@]}"; do
if [[ (-z "${!var}") ]]; then
echo "${var} is not set"
Expand All @@ -20,8 +20,6 @@ POOL=LinuxAMDGPUPool
SCRIPT_DIR=$(dirname $(readlink -f "$0"))
ACCERA_ROOT=${SCRIPT_DIR}/../../

sudo docker login -u ${ACR_USER} -p ${ACR_SECRET} ${ACR_REPO}

#
# Debugging Example:
#
Expand Down
5 changes: 4 additions & 1 deletion .azure/sdl-set1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,10 @@ pool:
vmImage: windows-latest

variables:
VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
- name: VULKAN_CACHE_DIR
value: $(Pipeline.Workspace)/.vulkansdk
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:

Expand Down
5 changes: 4 additions & 1 deletion .azure/sdl-set2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,10 @@ pool:
vmImage: windows-latest

variables:
VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
- name: VULKAN_CACHE_DIR
value: $(Pipeline.Workspace)/.vulkansdk
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:

Expand Down
8 changes: 6 additions & 2 deletions .azure/sdl-set3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,12 @@ jobs:
vmImage: windows-latest

variables:
VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
LGTM.UploadSnapshot: true
- name: VULKAN_CACHE_DIR
value: $(Pipeline.Workspace)/.vulkansdk
- name: LGTM.UploadSnapshot
value: true
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:
- task: UsePythonVersion@0
Expand Down
5 changes: 4 additions & 1 deletion .azure/win-accera.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,10 @@ strategy:
Python.Version: "3.10"

variables:
VULKAN_CACHE_DIR: $(Pipeline.Workspace)/.vulkansdk
- name: VULKAN_CACHE_DIR
value: $(Pipeline.Workspace)/.vulkansdk
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:

Expand Down
2 changes: 2 additions & 0 deletions .azure/win-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ variables:
value: $(Pipeline.Workspace)/.pip
- name: VCPKG_BINARY_SOURCES
value: "clear;nuget,$(VCPKG_NUGET_FEED),readwrite"
- name: VCPKG_ROOT
value: "$(Build.SourcesDirectory)/external/vcpkg"

steps:
- task: NuGetAuthenticate@0
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:

runs-on: ubuntu-latest
container:
image: acceracontainers.azurecr.io/accera-llvm-ubuntu:main-llvmorg-14.0.6
image: acceracontainers.azurecr.io/accera-llvm-ubuntu:llvmorg-14.0.6-1
steps:
- uses: actions/checkout@v2
with:
Expand Down
Loading

0 comments on commit adda009

Please sign in to comment.