Release v1.2.1 · microsoft/Accera

What's Changed

Merged PR 2391: Update quickstart example, updated docs structure per
feedback. [Lisa Ong]
- Teasers for transformations in the Quickstart sample (to differentiate Accera from others), with benchmarking
- Removed the Miscellaneous section, redistributed various docs to various related locations
- Renamed the cross compilation tutorial so that it is ordered last
Merged PR 2392: Populate Target.Models based on known devices. [Kern
Handa]

Populate Target.Models based on known devices
Merged PR 2390: Merge multiple HAT files during project building.
[Kern Handa]

Merge multiple HAT files during project building

Related work items: #3559
Merged PR 2386: Add support for various targets. [Kern Handa]

Add support for various targets

Related work items: #3631
Merged PR 2389: [nfc] Doc typos and consistency fixes. [Lisa Ong]
Merged PR 2388: Update quickstart example, add binder quickstart.
[Lisa Ong]
- Update quickstart example to perform a matmul + ReLU (unoptimized)
- Add Launch in Binder button to run everything in the browser
Merged PR 2387: Placeholder GPU GridUnit definitions, add library
creation from multiple object files. [Lisa Ong]

Dependent HAT PR: microsoft/hat#21
- GridUnit definitions are static until we have real GPU targets. These are updated just to be consistent with the Manual
- When not cross compiling, combine multiple .obj/.o into .lib/.a
Related work items: #3576
Merged PR 2384: Update target docs, split Intel generation 8 and 9 for
consistency. [Lisa Ong]
- Update target docs to list the name of the target in the table
- Define separate models for Intel generation 8 and 9 for consistency
Related work items: #3631
Merged PR 2383: Support dynamic libs from Package.build [Lisa Ong]
- Add static and dynamic variants to the HAT and MLIR formats
- MLIR format is also split because we'd want to support MLIR inspection of the cross-compilation scenario without forcing users to switch between dynamic and static
- Updated README sample
Left for future work:
- Combining multiple object files into a static lib or dynamic lib. We'd need to think about how HAT packages can be merged together (for example, how to reconcile the metadata in the HAT file, such as description, author - do we merge all metadata or just pick the first HAT file encountered as the "master", etc)
Related PR: microsoft/hat#18

Related work items: #3576
Merged PR 2382: [nfc] Move Case Studies out of the Accera repo. [Lisa
Ong]

Case Studies will live in other repositories, and be cross linked from the Accera repo's Case Studies README.md (to be added in the future).

Related work items: #3632
Merged PR 2379: Specify dynamic lib dependencies from the HAT Package.
[Lisa Ong]

This is the final missing piece before we transition to building static / dynamic libs using hatlib.
- Plan infers additional dynamic dependencies when the target is GPU or when parallelization is requested.
- Package.add collects the dependency info the various Plan instances.
- Package.build, the platform parameter is used to resolve to the appropriate library (either a path or a -l directive).
  - For library paths that cannot be fully determined in advance, we default to the current working directory, so perhaps the user can put the lib in the same path as the binaries. (this needs to be fleshed out more)
- Removed dead code
Dependent hatlib PR: https://github.com/microsoft/hat/pull/16/files

Related work items: #3576
Merged PR 2380: Add Raspberry Pi 4 (B) support. [Kern Handa]

Related work items: #3631

Merged PR 2368: Update and optimize acc-translate. [Abdul Dakkak]

propagate constants while generating C++ code
inline mlir within the C++ code to ease debugging
increase support for vector ops
silence a lot of warnings that were being emitted in the acc-translate codebase

The following

// CONFIG: {"K":2048,"M":2048,"N":2048,"block":{"x":16,"y":16,"z":1},"grid":{"x":128,"y":128,"z":1}}
module @gemm_naive_14479263422999410716_module attributes {gpu.binary = "HSACO"} {
  func @gemm_naive_14479263422999410716(%arg0: memref<2048x2048xf32> loc(unknown), %arg1: memref<2048x2048xf32> loc(unknown), %arg2: memref<2048x2048xf32> loc(unknown)) {
    %c16 = constant 16 : index loc(unknown)
    %c0 = constant 0 : index loc(unknown)
    %c2048 = constant 2048 : index loc(unknown)
    %c1 = constant 1 : index loc(unknown)
    %cst = constant 0.000000e+00 : f32 loc(unknown)
    %0 = "gpu.thread_id"() {dimension = "x"} : () -> index loc(unknown)
    %1 = "gpu.thread_id"() {dimension = "y"} : () -> index loc(unknown)
    %2 = "gpu.block_id"() {dimension = "x"} : () -> index loc(unknown)
    %3 = "gpu.block_id"() {dimension = "y"} : () -> index loc(unknown)
    %4 = scf.for %arg3 = %c0 to %c2048 step %c1 iter_args(%arg4 = %cst) -> (f32) {
      %11 = muli %3, %c16 : index loc(unknown)
      %12 = addi %1, %11 : index loc(unknown)
      %13 = memref.load %arg0[%12, %arg3] : memref<2048x2048xf32> loc(unknown)
      %14 = muli %2, %c16 : index loc(unknown)
      %15 = addi %0, %14 : index loc(unknown)
      %16 = memref.load %arg1[%arg3, %15] : memref<2048x2048xf32> loc(unknown)
      %17 = mulf %13, %16 {RelaxedPrecision} : f32 loc(unknown)
      %18 = addf %arg4, %17 {RelaxedPrecision} : f32 loc(unknown)
      scf.yield %18 : f32 loc(unknown)
    } loc(unknown)
    %5 = muli %3, %c16 : index loc(unknown)
    %6 = addi %1, %5 : index loc(unknown)
    %7 = muli %2, %c16 : index loc(unknown)
    %8 = addi %0, %7 : index loc(unknown)
    %9 = memref.load %arg2[%6, %8] : memref<2048x2048xf32> loc(unknown)
    %10 = addf %9, %4 {RelaxedPrecision} : f32 loc(unknown)
    memref.store %10, %arg2[%6, %8] : memref<2048x2048xf32> loc(unknown)
    return loc(unknown)
  } loc(unknown)
} loc(unknown)

generates the following cpp file

#if defined(__HIP_PLATFORM_AMD__)
#include <hip/hip_runtime.h>
using vfloatx2_t = float __attribute__((ext_vector_type(2)));
using vfloatx4_t = float __attribute__((ext_vector_type(4)));
using vfloatx16_t = float __attribute__((ext_vector_type(16)));
#else
#include "cuda_fp16.h"
#endif // !defined(__HIP_PLATFORM_AMD__)

#include <math.h>
#include <stdint.h>

__global__ void gemm_naive_14479263422999410716(float (*arg0)[2048], float (*arg1)[2048], float (*arg2)[2048])
{
    /*%0 = "gpu.thread_id"() {dimension = "x"} : () -> index*/
    const uint threadIdx_x_0 = threadIdx.x;
    /*%1 = "gpu.thread_id"() {dimension = "y"} : () -> index*/
    const uint threadIdx_y_1 = threadIdx.y;
    /*%2 = "gpu.block_id"() {dimension = "x"} : () -> index*/
    const uint blockIdx_x_2 = blockIdx.x;
    /*%3 = ...

Merged PR 2376: [build] Install acc-lsp-server as an internal tool.
[Lisa Ong]

Removes acc-lsp-server from accera-compilers

Minor CMake macro renames to (hopefully) improve usability
Merged PR 2378: [doc] Update doc links after DSL changes, fix missing
file warnings. [Lisa Ong]

Verified by:
```
cd <accera_root>
pip install mkdocs-material mkdocs-git-revision-date-plugin
mkdocs serve
```
Merged PR 2377: Retire Benchmark.py, use hatlib for benchmarking and
shared library creation. [Lisa Ong]

This cleanup work precedes the actual work to produce static or dynamic libraries by migrating existing HAT Python scripts to consume hatlib. Next PRs will consume hatlib to produce those libraries.

hatlib defines a HAT package as .hat files and a library.
- Remove accera.tuning.AutoBenchmark and replace usages with hat.run_benchmark in case studies
- Removed accera.tuning.CorrectnessCheck. Baked correctness checking into accera.test.verifiers
- Disabled some tests in preparation for coming work (next PRs)
  - parallelization tests: need to specify lomp as a link target dependency in the HAT file, and update hatlib to honor this flag
  - emit_unpacked_buffer_tests: to resolve multi-MLIR-module scenario where we have a globals module in addition to the package module
Depends on this PR: microsoft/hat#15

Related work items: #3556
Merged PR 2374: Retain and honor the order of functions added to the
package. [Kern Handa]

Retain and honor the order of functions added to the package

Related work items: #3629
Merged PR 2371: add lsp server for accera. [Abdul Dakkak]

this adds an lsp server to be used with the mlir vscode extension https://marketplace.visualstudio.com/items?itemName=llvm-vs-code-extensions.vscode-mlir . You will have to specify the lsp server in your settings.json . On my system this means to add the following setting
```
  "mlir.server_path": "${workspaceFolder}/build/accera/acc-lsp-server/acc-lsp-server",
```
It's not super robust though
Merged PR 2372: reduce install size. For example, on linux the install
size goes from 873M to 742M on Linux. [Abdul Dakkak]

reduce install size. For example, on linux the install dir goes from 873M to 742M. More can be done along those lines
Merged PR 2369: run clang-format on acc_translate. [Abdul Dakkak]

run clang-format on acc_translate. There are no modifications to the code
Merged PR 2367: Selectively emit GPU utilities. [Kern Handa]

Selectively emit GPU utilities

Related work items: #3559
Merged PR 2366: [build] Fix manylinux package build. [Lisa Ong]

Apply updated requirements.txt without rebuilding docker image
Merged PR 2365: Unify Package.add_function and Package.add_functions
into Package.add. [Kern Handa]

Related work items: #3549
Merged PR 2363: Initial quickstart example in the main README. [Lisa
Ong]
- The quickstart example demonstrates how to do everything (including calling the function) from Python
- hatlib is now a runtime dependency as a result.
  - We should consider updating at least the HelloMatMul Tutorials to also cover how to call functions from Python for quick testing. Calling from C++ is still the mainline scenario for performance
Dependent PR: microsoft/hat#11

Related work items: #3630
Merged PR 2364: Rename action plan references to plan. [Kern Handa]

Related work items: #3563
Merged PR 2362: [hygiene] Move manylinux pipeline triggers from
classic to YAML. [Lisa Ong]

For maintainability, so that the triggers for that pipeline are in one place
Merged PR 2350: LLVM update to 13.0.0. [Lisa Ong]

Updated LLVM to the "llvmorg-13.0.0" tag

Related work items: #3618
Merged PR 2361: Build release versions of binaries for packaging
purposes, workaround auditwheel compression bug. [Lisa Ong]
- We currently build RelWithDebInfo instead of Release. This can result in packages that are too big to be uploaded to PyPI. A quick fix is to enable Release builds when invoked by the CI pipelines.
- Add triggering by tags for all pipelines that produce packages intended for PyPI (Windows, ManyLinux, macOS)
- Add pipeline to automate creating an LLVM build environment for the ManyLinux pipeline
- Revert to a last known good version of auditwheel (5.0.0) due to a compression bug (pypa/auditwheel#366)
[doc] tweaking public links. [Lisa Ong]
Merged PR 2352: Reference github URLs for links in README.md. [Lisa
Ong]

README.md is referenced in PyPI, so these need to be fully-qualified URLs.

(The links will not work until the repo is published)

Related work items: #3619
Merged PR 2360: Fix divide-by-0 crash when the active block exceeds
the vectorizable. [Mason Remy]

Fix divide-by-0 crash when the active block exceeds the vectorizable
size in the innermost dimension
Add smoke test for this case. [Mason Remy]
Fix divide-by-0 crash when the active block exceeds the vectorizable
size in the innermost dimension. [Mason Remy]
Squashed commit of the following: [Lisa Ong]

commit add8396adc6e0f4e3cf0ae89796d08ac416c00a4
Tweaked dark mode for better contrast, added favicon, improved
navigation. [Lisa Ong]
Merged PR 2359: [docs] Fix rendering issues with code blocks and
bullet points. [Lisa Ong]

Also added sticky nav and tabs
Minor typos in docs (#4) [Lisa Ong]
- Update 00 Introduction.md
- Update mkdocs.yml
- Update Installing_accera_on_MacOS.md
- Update Installing_accera_on_Ubuntu.md
- Update Installing_on_MacOS.md
- Update Installing_on_Ubuntu.md
- Update Optimized_MatMul.md
- Update Hello_MatMul.md
- Update Cross_Compilation_PI3.md
Add copyright. [Lisa Ong]
Use mkdocs for documentation (#3) [Lisa Ong, Lisa Ong]
- mkdocs integration
- add publishing workflow
- doc the doc
Backport doc fixes from gh-pages to main (https://github.com/microsoft
/Accera/commit/ff491e3401691124b2aa6c3ee1d317bf264bdc11) [Lisa Ong]
Merged PR 2345: Infer number of threads from the parallelization
indices. [Lisa Ong]

The number of threads was previously set to Target.num_threads.

This change treats Target.num_threads as a capacity setting, and infers the number of threads from an aggregate of:
- the number of unsplit indices
- the number of split blocks for each outermost index
This gives the user control over how many threads to request.

Examples:
- indices = i, j, k : 3 threads, one per index. Reason is that it doesn't make sense to just use 1 thread. For the future, we may want to add an explicit parameter to control the number of threads for this case
- indices = i, where ii = i.split(N//4): N//4 threads. We could have used ceiling(N/4), but due to loop unswitching, we don't directly apply the extra thread to the boundary loop. (future work?)
- indices = i, j, where ii = i.split(N//4): N//4 + 1 threads.
Implementation detail: if workshare loop collapsing happens because the indices are contiguous, the number of threads assigned is unaffected.

Related work items: #3554

v1.2.0

Merged PR 2349: Add missing steps to CMake build instructions. [Lisa
Ong]
Merged PR 2347: Add pip install for linux. [Lisa Ong]

Linux packages can now be pip installed directly

Some cosmetic edits to install instructions
Merged PR 2326: Update install docs for Visual Studio 2022. [JUBI
TANEJA]

Related work items: #3605
Nit. [Jubi Taneja]
Merge branch 'main' of vs-
ssh.visualstudio.com:v3/intelligentDevices/ELL/Accera into
dev/jubitaneja/VS2022-install-docs. [Jubi Taneja]
Edits. [Jubi Taneja]
Update install docs for Visual Studio 2022. [Jubi Taneja]
Merged PR 2346: Canary workflows for building with latest LLVM
release. [Lisa Ong]

This pipeline is part of a two-stage workflow.

Stage 1:
- Weekly docker image build that pulls the latest tagged official release of LLVM and rebuilds the image.
- Currently lives in: https://github.com/lisaong/accera-llvm-canary but can be moved to a more permanent location once this pipeline is stable.
- Github actions are used here for convenience (longer timeouts, better integration). In the future we can move to Azure DevOps if similar functionality is available.
Stage 2: (this PR)
- Weekly canary build that consumes the latest docker image produced in stage 1. This is on a weekly schedule because triggering on container pushes is not yet supported by ADO.
When a new release of LLVM is published:
- Stage 1's weekly build will fail because the port SHA will change. This is ok because we want manual intervention to update the LLVM vcpkg portfile to update the patches, etc.
- Stage 2's weekly build may also fail. This is where we would stage changes in an Accera branch to support the new LLVM release.
As of this PR, LLVM 13.0.1 is being pre-released. TODO: test out the workflow with the upcoming pre-release.

Related work items: #3616
Merged PR 2340: Support Max element / budget caching for manual
caches. [Mason Remy]

Support Max element / budget caching for manual caches

Max element / budget caching previously only worked for automatic
caches, however the hierarchical caching change made automatic caches
harder to request from the DSL. This change enables max element caches
for manual caches by iteratively searching for the level at which a
cache should be placed due to the budget.

Notes:
- Currently if the budget is 0, that is treated as though the budget is
  1, however maybe we want this to be an error case
- Different boundary condition sections of the loopnest may have
  differently sized caches realized due to how the budget computation
  works. i.e. if caching around a main loop would exceed a budget but
  caching around the boundary would not, then the same cache would exist
  inside the main loop and outside the boundary loop
Related work items: #3615
Merge branch 'main' into review/masonr/max_element_caching. [Mason
Remy]
Merged PR 2342: Add logic check for target compat; Debug mode makes
use of func target. [Kern Handa]

Add logic check for target compat; Debug mode makes use of func target

This change adds the concept of Target compatibility so that functions
that are for the same target but have different settings can be added
freely. This is particularly helpful when adding Debug mode checks for a
function, as the Debug mode function naturally is going to be a subset
of the original function's target.

This change also introduces the concept of the maxinum for a number of
Target properties, which is used to test whether one target is
compatible with another.

Another related change is that Debug mode now makes use of the target of
the function being checked. This might need to be further addressed to
figure out the correct way to debug GPU or remote targets
Add logic check for target compat; Debug mode makes use of func
target. [Kern Handa]

This change adds the concept of Target compatibility so that functions
that are for the same target but have different settings can be added
freely. This is particularly helpful when adding Debug mode checks for a
function, as the Debug mode function naturally is going to be a subset
of the original function's target.

This change also introduces the concept of the maxinum for a number of
Target properties, which is used to test whether one target is
compatible with another.

Another related change is that Debug mode now makes use of the target of
the function being checked. This might need to be further addressed to
figure out the correct way to debug GPU or remote targets
Merged PR 2343: Switch to PNGs for logo assets. [Lisa Ong]

This allows the images to render more reliably in preview mode
Merged PR 2344: Add libvulkan to Manylinux builds. [Lisa Ong]

Set LD_LIBRARY_PATH so that auditwheel can find the dependency.

Assumes that target Linux system will have the lib preinstalled per install instructions.

Update docker image used by pipeline.

Related work items: #3529
Merged PR 2339: Add logo and badges to README.md, licenses to whls.
[Lisa Ong]

Further adjustments deferred until repo is made public
Fix c++ dsl test vectorize invocations. [Mason Remy]
Make budget = 0 an error. [Mason Remy]
Taking PR feedback. [Mason Remy]
Fix C++ DSL test failures by making C++ DSL always create manual
active block caches. [Mason Remy]
Support Max element / budget caching for manual caches. [Mason Remy]

Max element / budget caching previously only worked for automatic
caches, however the hierarchical caching change made automatic caches
harder to request from the DSL. This change enables max element caches
for manual caches by iteratively searching for the level at which a
cache should be placed due to the budget.

Notes:
- Currently if the budget is 0, that is treated as though the budget is
  1, however maybe we want this to be an error case
- Different boundary condition sections of the loopnest may have
  differently sized caches realized due to how the budget computation
  works. i.e. if caching around a main loop would exceed a budget but
  caching around the boundary would not, then the same cache would exist
  inside the main loop and outside the boundary loop
Merged PR 2329: Define CPU targets in Accera. [JUBI TANEJA]
- Intel Core processors and Intel Xeon
- documentation and definitions in python bindings
Related work items: #3571
Fix dsl_tests and other references of intel core processor. [Jubi
Taneja]
Targets definition. [Jubi Taneja]
More details of targets. [Jubi Taneja]
Edits. [Jubi Taneja]
Edits. [Jubi Taneja]
More details on extensions. [Jubi Taneja]
Add target details. [Jubi Taneja]
Merged PR 2338: Check for presence of libomp in macOS and Linux. [Lisa
Ong]

Only apply the linkage to libomp if present in the target system.

This fixes the manylinux pipeline smoke test failure, which fails because the manylinux system does not have a compatible libomp installed.
Merged PR 2337: Build manylinux wheels for PyPI uploads. [Lisa Ong]
- Add an Azure Pipeline that builds and uploads packages based on manylinux2014. This uses a container that contains accera-llvm and other build dependencies pre-installed
- Link to the system libomp at target accc time (when libomp is not present in the manylinux2014 build system, but may be present in an Ubuntu target system, for example)
- manylinux2014 is what onnxruntime uses as well. manylinux_2_24 is available but not as widely used afaict (punt for future work).
Misc fixes:
- Missing copyright blurbs
- Updated accera/python/README.md
- Drop clean --all from build.sh/build.bat so that full rebuilds are not the default
Merged PR 2336: value::Abs now supports non-fp types, fixes non-fp
Debug mode. [Kern Handa (KERN)]

value::Abs now supports non-fp types, fixes non-fp Debug mode
Merged PR 2333: Initialize vcpkg in the SDL pipelines. [Lisa Ong]

Missed these changes from the previous PR, now that vcpkg and packages need to be installed.
Merged PR 2335: Ignore PyPI when installing local wheels in the CI
pipelines. [Lisa Ong]

Update the CI pipelines to ignore PyPI when installing accera wheels.
Merged PR 2334: Add support for hierarchical caching. [Mason Remy]

Add support for hierarchical caching

This adds support for creating an active block cache of an existing
active block cache (note that hierarchical caching for automatic caches
was already supported, however the cache itself was not used as an
argument to the cache call in that scenario).

This change includes:
- Moving cache access maps and arrays of loopnest index attributes onto
  the MakeCacheOps
- Adding helpers to MakeCacheOp to construct access maps for the caches
  given a position in the loopnest
- Remove redundant access map computation in active block cache copy and
  reduce
- Support for hierarchical caches that are parameterized
- Implicitly hides automatic caches by assuming a layout on a cache call
  which doesn't have a layout provided. Since any cache call with a
  layout becomes an active block cache, this turns all cache calls into
  active block caches. In a later PR we could add an undocumented flag
  to enable users to request an automatic cache if we want to, however
  long-term automatic caches should be removed completely.
- Fix cache merging bug where output caches with a boundary condition on
  the cache level weren't constructing a union of the different loop
  branches when computing the active block
- Fix multi-cache merging bug where a boundary condition on a loop
  between the trigger level and the cache level and that loop IV is used
  to index into the cache was resulting in the caches being merged.
  Instead these caches should not be merged since the are accessing
  different regions. An unfortunate side-effect of this fix is that some
  multi-caches which have a boundary condition between the trigger level
  and the cache level where the boundary loop IV is not used to index
  into the cache won't be successfully merged. This isn't technically
  wrong as we are still copying data the number of times requested based
  on the multicache definition, however it is a potential missed
  optimization opportunity in this edge case.
- Disables max_element caching as this was only supported for automatic
  caches. A later PR will support this for active block caches
Related work items: #3453
Add support for hierarchical caching. [Mason Remy]

This adds support for creating an active block cache of an existing
active block cache (note that hierarchical caching for automatic caches
was already supported, however the cache itself was not used as an
argument to the cache call in that scenario).

This change includes:
- Moving cache access maps and arrays of loopnest index attributes onto
  the MakeCacheOps
- Adding helpers to MakeCacheOp to construct access maps for the caches
  given a position in the loopnest
- Remove redundant access map computation in active block cache copy and
  reduce
- Support for hierarchical caches that are parameterized
- Implicitly hides automatic caches by assuming a layout on a cache call
  which doesn't have a layout provided. Since any cache call with a
  layout becomes an active block cache, this turns all cache calls into
  active block caches. In a later PR we could add an undocumented flag
  to enable users to request an automatic cache if we want to, however
  long-term automatic caches should be removed completely.
- Fix cache merging bug where output caches with a boundary condition on
  the cache level weren't constructing a union of the different loop
  branches when computing the active block
- Fix multi-cache merging bug where a boundary condition on a loop
  between the trigger level and the cache level and that loop IV is used
  to index into the cache was resulting in the caches being merged.
  Instead these caches should not be merged since the are accessing
  different regions. An unfortunate side-effect of this fix is that some
  multi-caches which have a boundary condition between the trigger level
  and the cache level where the boundary loop IV is not used to index
  into the cache won't be successfully merged. This isn't technically
  wrong as we are still copying data the number of times requested based
  on the multicache definition, however it is a potential missed
  optimization opportunity in this edge case.
- Disables max_element caching as this was only supported for automatic
  caches. A later PR will support this for active block caches
Merged PR 2325: Support building LLVM via vcpkg (no remote caching)
[Lisa Ong]

This change adds vcpkg support for external developers to build their own copy of LLVM based on public github sources.

Due to complexities of hosting large Nuget packages, the vcpkg built LLVM is local-only. We're still using Conan for LLVM for internal use.
- Added vcpkg as a submodule
- Migrated tomlplusplus and catch2 to vcpkg. pybind11 is untouched because it uses CMake FetchContent (the simplest and most direct method)
- Added top level build scripts for generating the Python packages
- Added support for installing LLVM via vcpkg. This is opted-in by setting the environment variable LLVM_SETUP_VARIANT or by passing in -DLLVM_SETUP_VARIANT during configuration:
  - LLVM_SETUP_VARIANT=Conan will use Conan to acquire pre-built LLVM bits (internal use only)
  - If unset, default behavior is to use vcpkg to build and install LLVM bits
- Whenever we update LLVM, we need to
  - Build and upload the internal packages [as before]
  - Update the vcpkg port by revising the Git hash and applying any patches. [new]
Related work items: #3611
Merged PR 2331: Add Kernel::GetIndices and wire it up. [Kern Handa
(KERN)]

Add Kernel::GetIndices and wire it up
Merged PR 2332: LoopNestBuilder minor code fixes. [Kern Handa (KERN)]

LoopNestBuilder minor code fixes

Related work items: #3602
Merged PR 2330: Rename main pass to acc-to-llvm. [Kern Handa (KERN)]

Rename main pass to acc-to-llvm
Merged PR 2328: Make git ignore .vscode symlinks as well. [Kern Handa
(KERN)]

Make git ignore .vscode symlinks as well
Merged PR 2327: Updated pip install instructions to official PyPI
repositories (windows, macOS) [Lisa Ong]

Linux instructions will be updated once the manylinux distribution packages are ready and uploaded.
Merged PR 2324: bugfix in benchmark HAT package while generating
main.cpp. [JUBI TANEJA]

bugfix in benchmark HAT package while generating main.cpp to include correct .hat files

Related work items: #3613
Merged PR 2322: Port dev/byronc/address_sdl_timeouts to Accera repo.
[Lisa Ong]

Add BinSkim tool to SDL pipeline runs. Split the original SDL pipeline into 3 stages to avoid timeouts in ADO.

Add build flags recommended by BinSkim.

Original PR: !2303

Related work items: #3599
Merged PR 2323: Re-enabling code signing for Windows distributions.
[Lisa Ong]

Disabled for Linux and macOS pending future support
Merged PR 2320: Split accera python wheels to within 100MB. [Lisa Ong]
- Enable splitting of into component packages when building for the Packaging CI pipelines
- Support development mode for top level setup.py to place everything in build/lib.*
- Import and shared library paths are unchanged. Only moved executables into the accera/bin folder
- Currently only the accera-llvm and accera-compilers packages are required dependencies for the accera package
spec

Azure artifacts feed: https://intelligentdevices.visualstudio.com/ELL/_packaging?_a=feed&feed=Accera

Manually seeded Windows and macOS packages on PyPI (linux is pending #3529):
- https://pypi.org/project/accera/
- https://pypi.org/project/accera-compilers/
- https://pypi.org/project/accera-gpu/
- https://pypi.org/project/accera-llvm/
Related work items: #3577
Merged PR 2319: [forward port] [build] Fix py3.7 test regression,
applied workaround for Azure Pipelines caching infra issue. [Lisa Ong]

b005dad65a4a36a52dd627af5e38b803bb8102e1
Merged PR 2318: [forward port] Fix a few typos and nits in
installation guide for Windows. [Lisa Ong]

be37216a165ed72cb51636cf88bb9f40dbb8f9cc
Merged PR 2317: [nfc] Add licensing information to all source files.
[Lisa Ong]
- Added MIT license blob
- One-liner blurbs are mostly empty, except for a handful that are already commented at the top of the file.
- Authors are grandfathered (existing ones maintained, no new ones added)
Related work items: #3579
Merged PR 2316: Migrating from old repo to new repo. [Lisa Ong]

Old repo commit id: d61bd7d31e2febc45321da72ed18278e27dbe4cb

Related work items: #3608
SUPPORT.md committed. [Microsoft Open Source]
SECURITY.md committed. [Microsoft Open Source]
LICENSE committed. [Microsoft Open Source]
README.md committed. [Microsoft Open Source]
CODE_OF_CONDUCT.md committed. [Microsoft Open Source]
Initial commit. [microsoft-github-operations[bot]]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.1

What's Changed

v1.2.0