Skip to content

v1.2.1

Compare
Choose a tag to compare
@lisaong lisaong released this 26 Jan 11:08
· 99 commits to main since this release

What's Changed

  • Merged PR 2391: Update quickstart example, updated docs structure per
    feedback. [Lisa Ong]

    • Teasers for transformations in the Quickstart sample (to differentiate Accera from others), with benchmarking
    • Removed the Miscellaneous section, redistributed various docs to various related locations
    • Renamed the cross compilation tutorial so that it is ordered last
  • Merged PR 2392: Populate Target.Models based on known devices. [Kern
    Handa]

    Populate Target.Models based on known devices

  • Merged PR 2390: Merge multiple HAT files during project building.
    [Kern Handa]

    Merge multiple HAT files during project building

    Related work items: #3559

  • Merged PR 2386: Add support for various targets. [Kern Handa]

    Add support for various targets

    Related work items: #3631

  • Merged PR 2389: [nfc] Doc typos and consistency fixes. [Lisa Ong]

  • Merged PR 2388: Update quickstart example, add binder quickstart.
    [Lisa Ong]

    • Update quickstart example to perform a matmul + ReLU (unoptimized)
    • Add Launch in Binder button to run everything in the browser
  • Merged PR 2387: Placeholder GPU GridUnit definitions, add library
    creation from multiple object files. [Lisa Ong]

    Dependent HAT PR: microsoft/hat#21

    • GridUnit definitions are static until we have real GPU targets. These are updated just to be consistent with the Manual
    • When not cross compiling, combine multiple .obj/.o into .lib/.a

    Related work items: #3576

  • Merged PR 2384: Update target docs, split Intel generation 8 and 9 for
    consistency. [Lisa Ong]

    • Update target docs to list the name of the target in the table
    • Define separate models for Intel generation 8 and 9 for consistency

    Related work items: #3631

  • Merged PR 2383: Support dynamic libs from Package.build [Lisa Ong]

    • Add static and dynamic variants to the HAT and MLIR formats
    • MLIR format is also split because we'd want to support MLIR inspection of the cross-compilation scenario without forcing users to switch between dynamic and static
    • Updated README sample

    Left for future work:

    • Combining multiple object files into a static lib or dynamic lib. We'd need to think about how HAT packages can be merged together (for example, how to reconcile the metadata in the HAT file, such as description, author - do we merge all metadata or just pick the first HAT file encountered as the "master", etc)

    Related PR: microsoft/hat#18

    Related work items: #3576

  • Merged PR 2382: [nfc] Move Case Studies out of the Accera repo. [Lisa
    Ong]

    Case Studies will live in other repositories, and be cross linked from the Accera repo's Case Studies README.md (to be added in the future).

    Related work items: #3632

  • Merged PR 2379: Specify dynamic lib dependencies from the HAT Package.
    [Lisa Ong]

    This is the final missing piece before we transition to building static / dynamic libs using hatlib.

    • Plan infers additional dynamic dependencies when the target is GPU or when parallelization is requested.
    • Package.add collects the dependency info the various Plan instances.
    • Package.build, the platform parameter is used to resolve to the appropriate library (either a path or a -l directive).
      • For library paths that cannot be fully determined in advance, we default to the current working directory, so perhaps the user can put the lib in the same path as the binaries. (this needs to be fleshed out more)
    • Removed dead code

    Dependent hatlib PR: https://github.com/microsoft/hat/pull/16/files

    Related work items: #3576

  • Merged PR 2380: Add Raspberry Pi 4 (B) support. [Kern Handa]

    Related work items: #3631

  • Merged PR 2368: Update and optimize acc-translate. [Abdul Dakkak]

    • propagate constants while generating C++ code
    • inline mlir within the C++ code to ease debugging
    • increase support for vector ops
    • silence a lot of warnings that were being emitted in the acc-translate codebase

    The following

    // CONFIG: {"K":2048,"M":2048,"N":2048,"block":{"x":16,"y":16,"z":1},"grid":{"x":128,"y":128,"z":1}}
    module @gemm_naive_14479263422999410716_module attributes {gpu.binary = "HSACO"} {
      func @gemm_naive_14479263422999410716(%arg0: memref<2048x2048xf32> loc(unknown), %arg1: memref<2048x2048xf32> loc(unknown), %arg2: memref<2048x2048xf32> loc(unknown)) {
        %c16 = constant 16 : index loc(unknown)
        %c0 = constant 0 : index loc(unknown)
        %c2048 = constant 2048 : index loc(unknown)
        %c1 = constant 1 : index loc(unknown)
        %cst = constant 0.000000e+00 : f32 loc(unknown)
        %0 = "gpu.thread_id"() {dimension = "x"} : () -> index loc(unknown)
        %1 = "gpu.thread_id"() {dimension = "y"} : () -> index loc(unknown)
        %2 = "gpu.block_id"() {dimension = "x"} : () -> index loc(unknown)
        %3 = "gpu.block_id"() {dimension = "y"} : () -> index loc(unknown)
        %4 = scf.for %arg3 = %c0 to %c2048 step %c1 iter_args(%arg4 = %cst) -> (f32) {
          %11 = muli %3, %c16 : index loc(unknown)
          %12 = addi %1, %11 : index loc(unknown)
          %13 = memref.load %arg0[%12, %arg3] : memref<2048x2048xf32> loc(unknown)
          %14 = muli %2, %c16 : index loc(unknown)
          %15 = addi %0, %14 : index loc(unknown)
          %16 = memref.load %arg1[%arg3, %15] : memref<2048x2048xf32> loc(unknown)
          %17 = mulf %13, %16 {RelaxedPrecision} : f32 loc(unknown)
          %18 = addf %arg4, %17 {RelaxedPrecision} : f32 loc(unknown)
          scf.yield %18 : f32 loc(unknown)
        } loc(unknown)
        %5 = muli %3, %c16 : index loc(unknown)
        %6 = addi %1, %5 : index loc(unknown)
        %7 = muli %2, %c16 : index loc(unknown)
        %8 = addi %0, %7 : index loc(unknown)
        %9 = memref.load %arg2[%6, %8] : memref<2048x2048xf32> loc(unknown)
        %10 = addf %9, %4 {RelaxedPrecision} : f32 loc(unknown)
        memref.store %10, %arg2[%6, %8] : memref<2048x2048xf32> loc(unknown)
        return loc(unknown)
      } loc(unknown)
    } loc(unknown)
    

    generates the following cpp file

    #if defined(__HIP_PLATFORM_AMD__)
    #include <hip/hip_runtime.h>
    using vfloatx2_t = float __attribute__((ext_vector_type(2)));
    using vfloatx4_t = float __attribute__((ext_vector_type(4)));
    using vfloatx16_t = float __attribute__((ext_vector_type(16)));
    #else
    #include "cuda_fp16.h"
    #endif // !defined(__HIP_PLATFORM_AMD__)
    
    #include <math.h>
    #include <stdint.h>
    
    __global__ void gemm_naive_14479263422999410716(float (*arg0)[2048], float (*arg1)[2048], float (*arg2)[2048])
    {
        /*%0 = "gpu.thread_id"() {dimension = "x"} : () -> index*/
        const uint threadIdx_x_0 = threadIdx.x;
        /*%1 = "gpu.thread_id"() {dimension = "y"} : () -> index*/
        const uint threadIdx_y_1 = threadIdx.y;
        /*%2 = "gpu.block_id"() {dimension = "x"} : () -> index*/
        const uint blockIdx_x_2 = blockIdx.x;
        /*%3 = ...
    
    
  • Merged PR 2376: [build] Install acc-lsp-server as an internal tool.
    [Lisa Ong]

    Removes acc-lsp-server from accera-compilers

    Minor CMake macro renames to (hopefully) improve usability

  • Merged PR 2378: [doc] Update doc links after DSL changes, fix missing
    file warnings. [Lisa Ong]

    Verified by:

    cd <accera_root>
    pip install mkdocs-material mkdocs-git-revision-date-plugin
    mkdocs serve
    
  • Merged PR 2377: Retire Benchmark.py, use hatlib for benchmarking and
    shared library creation. [Lisa Ong]

    This cleanup work precedes the actual work to produce static or dynamic libraries by migrating existing HAT Python scripts to consume hatlib. Next PRs will consume hatlib to produce those libraries.

    hatlib defines a HAT package as .hat files and a library.

    • Remove accera.tuning.AutoBenchmark and replace usages with hat.run_benchmark in case studies
    • Removed accera.tuning.CorrectnessCheck. Baked correctness checking into accera.test.verifiers
    • Disabled some tests in preparation for coming work (next PRs)
      • parallelization tests: need to specify lomp as a link target dependency in the HAT file, and update hatlib to honor this flag
      • emit_unpacked_buffer_tests: to resolve multi-MLIR-module scenario where we have a globals module in addition to the package module

    Depends on this PR: microsoft/hat#15

    Related work items: #3556

  • Merged PR 2374: Retain and honor the order of functions added to the
    package. [Kern Handa]

    Retain and honor the order of functions added to the package

    Related work items: #3629

  • Merged PR 2371: add lsp server for accera. [Abdul Dakkak]

    this adds an lsp server to be used with the mlir vscode extension https://marketplace.visualstudio.com/items?itemName=llvm-vs-code-extensions.vscode-mlir . You will have to specify the lsp server in your settings.json . On my system this means to add the following setting

      "mlir.server_path": "${workspaceFolder}/build/accera/acc-lsp-server/acc-lsp-server",
    

    It's not super robust though

  • Merged PR 2372: reduce install size. For example, on linux the install
    size goes from 873M to 742M on Linux. [Abdul Dakkak]

    reduce install size. For example, on linux the install dir goes from 873M to 742M. More can be done along those lines

  • Merged PR 2369: run clang-format on acc_translate. [Abdul Dakkak]

    run clang-format on acc_translate. There are no modifications to the code

  • Merged PR 2367: Selectively emit GPU utilities. [Kern Handa]

    Selectively emit GPU utilities

    Related work items: #3559

  • Merged PR 2366: [build] Fix manylinux package build. [Lisa Ong]

    Apply updated requirements.txt without rebuilding docker image

  • Merged PR 2365: Unify Package.add_function and Package.add_functions
    into Package.add. [Kern Handa]

    Related work items: #3549

  • Merged PR 2363: Initial quickstart example in the main README. [Lisa
    Ong]

    • The quickstart example demonstrates how to do everything (including calling the function) from Python
    • hatlib is now a runtime dependency as a result.
      • We should consider updating at least the HelloMatMul Tutorials to also cover how to call functions from Python for quick testing. Calling from C++ is still the mainline scenario for performance

    Dependent PR: microsoft/hat#11

    Related work items: #3630

  • Merged PR 2364: Rename action plan references to plan. [Kern Handa]

    Related work items: #3563

  • Merged PR 2362: [hygiene] Move manylinux pipeline triggers from
    classic to YAML. [Lisa Ong]

    For maintainability, so that the triggers for that pipeline are in one place

  • Merged PR 2350: LLVM update to 13.0.0. [Lisa Ong]

    Updated LLVM to the "llvmorg-13.0.0" tag

    Related work items: #3618

  • Merged PR 2361: Build release versions of binaries for packaging
    purposes, workaround auditwheel compression bug. [Lisa Ong]

    • We currently build RelWithDebInfo instead of Release. This can result in packages that are too big to be uploaded to PyPI. A quick fix is to enable Release builds when invoked by the CI pipelines.
    • Add triggering by tags for all pipelines that produce packages intended for PyPI (Windows, ManyLinux, macOS)
    • Add pipeline to automate creating an LLVM build environment for the ManyLinux pipeline
    • Revert to a last known good version of auditwheel (5.0.0) due to a compression bug (pypa/auditwheel#366)
  • [doc] tweaking public links. [Lisa Ong]

  • Merged PR 2352: Reference github URLs for links in README.md. [Lisa
    Ong]

    README.md is referenced in PyPI, so these need to be fully-qualified URLs.

    (The links will not work until the repo is published)

    Related work items: #3619

  • Merged PR 2360: Fix divide-by-0 crash when the active block exceeds
    the vectorizable. [Mason Remy]

    Fix divide-by-0 crash when the active block exceeds the vectorizable
    size in the innermost dimension

  • Add smoke test for this case. [Mason Remy]

  • Fix divide-by-0 crash when the active block exceeds the vectorizable
    size in the innermost dimension. [Mason Remy]

  • Squashed commit of the following: [Lisa Ong]

    commit add8396adc6e0f4e3cf0ae89796d08ac416c00a4

  • Tweaked dark mode for better contrast, added favicon, improved
    navigation. [Lisa Ong]

  • Merged PR 2359: [docs] Fix rendering issues with code blocks and
    bullet points. [Lisa Ong]

    Also added sticky nav and tabs

  • Minor typos in docs (#4) [Lisa Ong]

    • Update 00 Introduction.md

    • Update mkdocs.yml

    • Update Installing_accera_on_MacOS.md

    • Update Installing_accera_on_Ubuntu.md

    • Update Installing_on_MacOS.md

    • Update Installing_on_Ubuntu.md

    • Update Optimized_MatMul.md

    • Update Hello_MatMul.md

    • Update Cross_Compilation_PI3.md

  • Add copyright. [Lisa Ong]

  • Use mkdocs for documentation (#3) [Lisa Ong, Lisa Ong]

    • mkdocs integration

    • add publishing workflow

    • doc the doc

  • Backport doc fixes from gh-pages to main (https://github.com/microsoft
    /Accera/commit/ff491e3401691124b2aa6c3ee1d317bf264bdc11) [Lisa Ong]

  • Merged PR 2345: Infer number of threads from the parallelization
    indices. [Lisa Ong]

    The number of threads was previously set to Target.num_threads.

    This change treats Target.num_threads as a capacity setting, and infers the number of threads from an aggregate of:

    • the number of unsplit indices
    • the number of split blocks for each outermost index

    This gives the user control over how many threads to request.

    Examples:

    • indices = i, j, k : 3 threads, one per index. Reason is that it doesn't make sense to just use 1 thread. For the future, we may want to add an explicit parameter to control the number of threads for this case
    • indices = i, where ii = i.split(N//4): N//4 threads. We could have used ceiling(N/4), but due to loop unswitching, we don't directly apply the extra thread to the boundary loop. (future work?)
    • indices = i, j, where ii = i.split(N//4): N//4 + 1 threads.

    Implementation detail: if workshare loop collapsing happens because the indices are contiguous, the number of threads assigned is unaffected.

    Related work items: #3554

v1.2.0

  • Merged PR 2349: Add missing steps to CMake build instructions. [Lisa
    Ong]

  • Merged PR 2347: Add pip install for linux. [Lisa Ong]

    Linux packages can now be pip installed directly

    Some cosmetic edits to install instructions

  • Merged PR 2326: Update install docs for Visual Studio 2022. [JUBI
    TANEJA]

    Related work items: #3605

  • Nit. [Jubi Taneja]

  • Merge branch 'main' of vs-
    ssh.visualstudio.com:v3/intelligentDevices/ELL/Accera into
    dev/jubitaneja/VS2022-install-docs. [Jubi Taneja]

  • Edits. [Jubi Taneja]

  • Update install docs for Visual Studio 2022. [Jubi Taneja]

  • Merged PR 2346: Canary workflows for building with latest LLVM
    release. [Lisa Ong]

    This pipeline is part of a two-stage workflow.

    Stage 1:

    • Weekly docker image build that pulls the latest tagged official release of LLVM and rebuilds the image.
    • Currently lives in: https://github.com/lisaong/accera-llvm-canary but can be moved to a more permanent location once this pipeline is stable.
    • Github actions are used here for convenience (longer timeouts, better integration). In the future we can move to Azure DevOps if similar functionality is available.

    Stage 2: (this PR)

    • Weekly canary build that consumes the latest docker image produced in stage 1. This is on a weekly schedule because triggering on container pushes is not yet supported by ADO.

    When a new release of LLVM is published:

    • Stage 1's weekly build will fail because the port SHA will change. This is ok because we want manual intervention to update the LLVM vcpkg portfile to update the patches, etc.
    • Stage 2's weekly build may also fail. This is where we would stage changes in an Accera branch to support the new LLVM release.

    As of this PR, LLVM 13.0.1 is being pre-released. TODO: test out the workflow with the upcoming pre-release.

    Related work items: #3616

  • Merged PR 2340: Support Max element / budget caching for manual
    caches. [Mason Remy]

    Support Max element / budget caching for manual caches

    Max element / budget caching previously only worked for automatic
    caches, however the hierarchical caching change made automatic caches
    harder to request from the DSL. This change enables max element caches
    for manual caches by iteratively searching for the level at which a
    cache should be placed due to the budget.

    Notes:

    • Currently if the budget is 0, that is treated as though the budget is
      1, however maybe we want this to be an error case
    • Different boundary condition sections of the loopnest may have
      differently sized caches realized due to how the budget computation
      works. i.e. if caching around a main loop would exceed a budget but
      caching around the boundary would not, then the same cache would exist
      inside the main loop and outside the boundary loop

    Related work items: #3615

  • Merge branch 'main' into review/masonr/max_element_caching. [Mason
    Remy]

  • Merged PR 2342: Add logic check for target compat; Debug mode makes
    use of func target. [Kern Handa]

    Add logic check for target compat; Debug mode makes use of func target

    This change adds the concept of Target compatibility so that functions
    that are for the same target but have different settings can be added
    freely. This is particularly helpful when adding Debug mode checks for a
    function, as the Debug mode function naturally is going to be a subset
    of the original function's target.

    This change also introduces the concept of the maxinum for a number of
    Target properties, which is used to test whether one target is
    compatible with another.

    Another related change is that Debug mode now makes use of the target of
    the function being checked. This might need to be further addressed to
    figure out the correct way to debug GPU or remote targets

  • Add logic check for target compat; Debug mode makes use of func
    target. [Kern Handa]

    This change adds the concept of Target compatibility so that functions
    that are for the same target but have different settings can be added
    freely. This is particularly helpful when adding Debug mode checks for a
    function, as the Debug mode function naturally is going to be a subset
    of the original function's target.

    This change also introduces the concept of the maxinum for a number of
    Target properties, which is used to test whether one target is
    compatible with another.

    Another related change is that Debug mode now makes use of the target of
    the function being checked. This might need to be further addressed to
    figure out the correct way to debug GPU or remote targets

  • Merged PR 2343: Switch to PNGs for logo assets. [Lisa Ong]

    This allows the images to render more reliably in preview mode

  • Merged PR 2344: Add libvulkan to Manylinux builds. [Lisa Ong]

    Set LD_LIBRARY_PATH so that auditwheel can find the dependency.

    Assumes that target Linux system will have the lib preinstalled per install instructions.

    Update docker image used by pipeline.

    Related work items: #3529

  • Merged PR 2339: Add logo and badges to README.md, licenses to whls.
    [Lisa Ong]

    Further adjustments deferred until repo is made public

  • Fix c++ dsl test vectorize invocations. [Mason Remy]

  • Make budget = 0 an error. [Mason Remy]

  • Taking PR feedback. [Mason Remy]

  • Fix C++ DSL test failures by making C++ DSL always create manual
    active block caches. [Mason Remy]

  • Support Max element / budget caching for manual caches. [Mason Remy]

    Max element / budget caching previously only worked for automatic
    caches, however the hierarchical caching change made automatic caches
    harder to request from the DSL. This change enables max element caches
    for manual caches by iteratively searching for the level at which a
    cache should be placed due to the budget.

    Notes:

    • Currently if the budget is 0, that is treated as though the budget is
      1, however maybe we want this to be an error case
    • Different boundary condition sections of the loopnest may have
      differently sized caches realized due to how the budget computation
      works. i.e. if caching around a main loop would exceed a budget but
      caching around the boundary would not, then the same cache would exist
      inside the main loop and outside the boundary loop
  • Merged PR 2329: Define CPU targets in Accera. [JUBI TANEJA]

    • Intel Core processors and Intel Xeon
    • documentation and definitions in python bindings

    Related work items: #3571

  • Fix dsl_tests and other references of intel core processor. [Jubi
    Taneja]

  • Targets definition. [Jubi Taneja]

  • More details of targets. [Jubi Taneja]

  • Edits. [Jubi Taneja]

  • Edits. [Jubi Taneja]

  • More details on extensions. [Jubi Taneja]

  • Add target details. [Jubi Taneja]

  • Merged PR 2338: Check for presence of libomp in macOS and Linux. [Lisa
    Ong]

    Only apply the linkage to libomp if present in the target system.

    This fixes the manylinux pipeline smoke test failure, which fails because the manylinux system does not have a compatible libomp installed.

  • Merged PR 2337: Build manylinux wheels for PyPI uploads. [Lisa Ong]

    • Add an Azure Pipeline that builds and uploads packages based on manylinux2014. This uses a container that contains accera-llvm and other build dependencies pre-installed
    • Link to the system libomp at target accc time (when libomp is not present in the manylinux2014 build system, but may be present in an Ubuntu target system, for example)
    • manylinux2014 is what onnxruntime uses as well. manylinux_2_24 is available but not as widely used afaict (punt for future work).

    Misc fixes:

    • Missing copyright blurbs
    • Updated accera/python/README.md
    • Drop clean --all from build.sh/build.bat so that full rebuilds are not the default
  • Merged PR 2336: value::Abs now supports non-fp types, fixes non-fp
    Debug mode. [Kern Handa (KERN)]

    value::Abs now supports non-fp types, fixes non-fp Debug mode

  • Merged PR 2333: Initialize vcpkg in the SDL pipelines. [Lisa Ong]

    Missed these changes from the previous PR, now that vcpkg and packages need to be installed.

  • Merged PR 2335: Ignore PyPI when installing local wheels in the CI
    pipelines. [Lisa Ong]

    Update the CI pipelines to ignore PyPI when installing accera wheels.

  • Merged PR 2334: Add support for hierarchical caching. [Mason Remy]

    Add support for hierarchical caching

    This adds support for creating an active block cache of an existing
    active block cache (note that hierarchical caching for automatic caches
    was already supported, however the cache itself was not used as an
    argument to the cache call in that scenario).

    This change includes:

    • Moving cache access maps and arrays of loopnest index attributes onto
      the MakeCacheOps
    • Adding helpers to MakeCacheOp to construct access maps for the caches
      given a position in the loopnest
    • Remove redundant access map computation in active block cache copy and
      reduce
    • Support for hierarchical caches that are parameterized
    • Implicitly hides automatic caches by assuming a layout on a cache call
      which doesn't have a layout provided. Since any cache call with a
      layout becomes an active block cache, this turns all cache calls into
      active block caches. In a later PR we could add an undocumented flag
      to enable users to request an automatic cache if we want to, however
      long-term automatic caches should be removed completely.
    • Fix cache merging bug where output caches with a boundary condition on
      the cache level weren't constructing a union of the different loop
      branches when computing the active block
    • Fix multi-cache merging bug where a boundary condition on a loop
      between the trigger level and the cache level and that loop IV is used
      to index into the cache was resulting in the caches being merged.
      Instead these caches should not be merged since the are accessing
      different regions. An unfortunate side-effect of this fix is that some
      multi-caches which have a boundary condition between the trigger level
      and the cache level where the boundary loop IV is not used to index
      into the cache won't be successfully merged. This isn't technically
      wrong as we are still copying data the number of times requested based
      on the multicache definition, however it is a potential missed
      optimization opportunity in this edge case.
    • Disables max_element caching as this was only supported for automatic
      caches. A later PR will support this for active block caches

    Related work items: #3453

  • Add support for hierarchical caching. [Mason Remy]

    This adds support for creating an active block cache of an existing
    active block cache (note that hierarchical caching for automatic caches
    was already supported, however the cache itself was not used as an
    argument to the cache call in that scenario).

    This change includes:

    • Moving cache access maps and arrays of loopnest index attributes onto
      the MakeCacheOps
    • Adding helpers to MakeCacheOp to construct access maps for the caches
      given a position in the loopnest
    • Remove redundant access map computation in active block cache copy and
      reduce
    • Support for hierarchical caches that are parameterized
    • Implicitly hides automatic caches by assuming a layout on a cache call
      which doesn't have a layout provided. Since any cache call with a
      layout becomes an active block cache, this turns all cache calls into
      active block caches. In a later PR we could add an undocumented flag
      to enable users to request an automatic cache if we want to, however
      long-term automatic caches should be removed completely.
    • Fix cache merging bug where output caches with a boundary condition on
      the cache level weren't constructing a union of the different loop
      branches when computing the active block
    • Fix multi-cache merging bug where a boundary condition on a loop
      between the trigger level and the cache level and that loop IV is used
      to index into the cache was resulting in the caches being merged.
      Instead these caches should not be merged since the are accessing
      different regions. An unfortunate side-effect of this fix is that some
      multi-caches which have a boundary condition between the trigger level
      and the cache level where the boundary loop IV is not used to index
      into the cache won't be successfully merged. This isn't technically
      wrong as we are still copying data the number of times requested based
      on the multicache definition, however it is a potential missed
      optimization opportunity in this edge case.
    • Disables max_element caching as this was only supported for automatic
      caches. A later PR will support this for active block caches
  • Merged PR 2325: Support building LLVM via vcpkg (no remote caching)
    [Lisa Ong]

    This change adds vcpkg support for external developers to build their own copy of LLVM based on public github sources.

    Due to complexities of hosting large Nuget packages, the vcpkg built LLVM is local-only. We're still using Conan for LLVM for internal use.

    • Added vcpkg as a submodule

    • Migrated tomlplusplus and catch2 to vcpkg. pybind11 is untouched because it uses CMake FetchContent (the simplest and most direct method)

    • Added top level build scripts for generating the Python packages

    • Added support for installing LLVM via vcpkg. This is opted-in by setting the environment variable LLVM_SETUP_VARIANT or by passing in -DLLVM_SETUP_VARIANT during configuration:

      • LLVM_SETUP_VARIANT=Conan will use Conan to acquire pre-built LLVM bits (internal use only)
      • If unset, default behavior is to use vcpkg to build and install LLVM bits
    • Whenever we update LLVM, we need to

      • Build and upload the internal packages [as before]
      • Update the vcpkg port by revising the Git hash and applying any patches. [new]

    Related work items: #3611

  • Merged PR 2331: Add Kernel::GetIndices and wire it up. [Kern Handa
    (KERN)]

    Add Kernel::GetIndices and wire it up

  • Merged PR 2332: LoopNestBuilder minor code fixes. [Kern Handa (KERN)]

    LoopNestBuilder minor code fixes

    Related work items: #3602

  • Merged PR 2330: Rename main pass to acc-to-llvm. [Kern Handa (KERN)]

    Rename main pass to acc-to-llvm

  • Merged PR 2328: Make git ignore .vscode symlinks as well. [Kern Handa
    (KERN)]

    Make git ignore .vscode symlinks as well

  • Merged PR 2327: Updated pip install instructions to official PyPI
    repositories (windows, macOS) [Lisa Ong]

    Linux instructions will be updated once the manylinux distribution packages are ready and uploaded.

  • Merged PR 2324: bugfix in benchmark HAT package while generating
    main.cpp. [JUBI TANEJA]

    bugfix in benchmark HAT package while generating main.cpp to include correct .hat files

    Related work items: #3613

  • Merged PR 2322: Port dev/byronc/address_sdl_timeouts to Accera repo.
    [Lisa Ong]

    Add BinSkim tool to SDL pipeline runs. Split the original SDL pipeline into 3 stages to avoid timeouts in ADO.

    Add build flags recommended by BinSkim.

    Original PR: !2303

    Related work items: #3599

  • Merged PR 2323: Re-enabling code signing for Windows distributions.
    [Lisa Ong]

    Disabled for Linux and macOS pending future support

  • Merged PR 2320: Split accera python wheels to within 100MB. [Lisa Ong]

    • Enable splitting of into component packages when building for the Packaging CI pipelines
    • Support development mode for top level setup.py to place everything in build/lib.*
    • Import and shared library paths are unchanged. Only moved executables into the accera/bin folder
    • Currently only the accera-llvm and accera-compilers packages are required dependencies for the accera package

    spec

    Azure artifacts feed: https://intelligentdevices.visualstudio.com/ELL/_packaging?_a=feed&feed=Accera

    Manually seeded Windows and macOS packages on PyPI (linux is pending #3529):

    Related work items: #3577

  • Merged PR 2319: [forward port] [build] Fix py3.7 test regression,
    applied workaround for Azure Pipelines caching infra issue. [Lisa Ong]

    b005dad65a4a36a52dd627af5e38b803bb8102e1

  • Merged PR 2318: [forward port] Fix a few typos and nits in
    installation guide for Windows. [Lisa Ong]

    be37216a165ed72cb51636cf88bb9f40dbb8f9cc

  • Merged PR 2317: [nfc] Add licensing information to all source files.
    [Lisa Ong]

    • Added MIT license blob
    • One-liner blurbs are mostly empty, except for a handful that are already commented at the top of the file.
    • Authors are grandfathered (existing ones maintained, no new ones added)

    Related work items: #3579

  • Merged PR 2316: Migrating from old repo to new repo. [Lisa Ong]

    Old repo commit id: d61bd7d31e2febc45321da72ed18278e27dbe4cb

    Related work items: #3608

  • SUPPORT.md committed. [Microsoft Open Source]

  • SECURITY.md committed. [Microsoft Open Source]

  • LICENSE committed. [Microsoft Open Source]

  • README.md committed. [Microsoft Open Source]

  • CODE_OF_CONDUCT.md committed. [Microsoft Open Source]

  • Initial commit. [microsoft-github-operations[bot]]