Squashed commit of the following:

commit 34e3732 Author: Lisa Ong <onglisa@microsoft.com> Date: Wed Jan 26 09:21:51 2022 +0000 Merged PR 2391: Update quickstart example, updated docs structure per feedback * Teasers for transformations in the Quickstart sample (to differentiate Accera from others), with benchmarking * Removed the Miscellaneous section, redistributed various docs to various related locations * Renamed the cross compilation tutorial so that it is ordered last Note: currently we are using dynamic navigation for Material with mkdocs, which avoids maintaining a separate nav section each time a markdown file is added/removed (becomes unwieldly as the number of files increases). This means that filenames will need to names in the ordered they will show up in tabs or sections. commit 972b7fc Author: Kern Handa <kerha@microsoft.com> Date: Wed Jan 26 08:34:09 2022 +0000 Merged PR 2392: Populate Target.Models based on known devices Populate Target.Models based on known devices commit 8d99afe Author: Kern Handa <kerha@microsoft.com> Date: Wed Jan 26 03:00:06 2022 +0000 Merged PR 2390: Merge multiple HAT files during project building Merge multiple HAT files during project building Related work items: #3559 commit 295d396 Author: Kern Handa <kerha@microsoft.com> Date: Tue Jan 25 20:36:04 2022 +0000 Merged PR 2386: Add support for various targets Add support for various targets Related work items: #3631 commit d0eef65 Author: Lisa Ong <onglisa@microsoft.com> Date: Tue Jan 25 11:36:00 2022 +0000 Merged PR 2389: [nfc] Doc typos and consistency fixes
microsoft · Jan 26, 2022 · 933e71f · 933e71f
1 parent 0bfb779
commit 933e71f
Show file tree

Hide file tree

Showing 52 changed files with 1,778 additions and 700 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -1,7 +1,7 @@
 ---
 name: Bug report
 about: Create a report to help us improve
-title: 'BUG: <enter bug title>'
+title: '[BUG] <enter bug title>'
 labels: ''
 assignees: ''
 
@@ -11,7 +11,7 @@ assignees: ''
 <!-- Please read our Rules of Conduct: https://opensource.microsoft.com/codeofconduct/ -->
 <!-- Please search for existing issues to avoid creating duplicates. -->
 <!-- Incomplete reports will lead to closing the issue. -->
-<!-- Also, please test using the latest master make sure your issue has not already been fixed -->
+<!-- Also, please test using the latest main and make sure your issue has not already been fixed -->
 
 **Describe the bug**
 A clear and concise description of what the bug is.
@@ -24,7 +24,7 @@ A clear and concise description of what the bug is.
 
 **To Reproduce**
 <!--  Include a detailed step by step process for recreating your issue. -->
-<!-- If your issue includes code, create a [gist](https://gist.github.com/) and past the link here. -->
+<!-- If your issue includes code, create a [gist](https://gist.github.com/) and paste the link here. -->
 Steps to reproduce the behavior:
 1. 
 2. 
@@ -34,4 +34,4 @@ Include the full error message in text form so that we can help troubleshoot qui
 **Expected behavior**
 A clear and concise description of what you expected to happen.
 
-**What's better than filing an issue? Filing a pull request :).**
+**What's better than filing an issue? Opening a pull request :).**
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -1,7 +1,7 @@
 ---
 name: Feature request
 about: Suggest an idea for this project
-title: ''
+title: '[Feature] <enter feature request title>'
 labels: ''
 assignees: ''
 

diff --git a/.github/ISSUE_TEMPLATE/question.md b/.github/ISSUE_TEMPLATE/question.md
@@ -1,7 +1,7 @@
 ---
 name: Question
 about: Support Questions
-title: "[Question]: <enter question title>"
+title: "[Q] <enter question title>"
 labels: ''
 assignees: ''
 
@@ -21,6 +21,6 @@ assignees: ''
 
 #### Context details
 <!-- Add OS, Accera version, Python version, if applicable -->
-<!-- If it's too large, you can create a [gist](https://gist.github.com/) and past the link here.  -->
+<!-- If it's too large, you can create a [gist](https://gist.github.com/) and paste the link here.  -->
 
 ### Include details of what you already did to find answers
diff --git a/.gitignore b/.gitignore
@@ -365,4 +365,10 @@ _version.py
 .vscode*
 
 # llvm setup
-LLVMSetupConan.cmake
+LLVMSetupConan.cmake
+
+# docs build
+docs/README.md
+
+# iPython
+.ipynb_checkpoints/
diff --git a/README.md b/README.md
@@ -3,7 +3,9 @@
 
 <a href="https://pypi.org/project/accera/"><img src="https://badge.fury.io/py/accera.svg" alt="PyPI package version"/></a> <a href="https://pypi.org/project/accera/"><img src="https://img.shields.io/pypi/pyversions/accera" alt="Python versions"/></a> ![MIT License](https://img.shields.io/pypi/l/accera)
 
-Accera is a programming model, a domain-specific programming language embedded in Python (eDSL), and an optimizing cross-compiler for compute-intensive code. Accera currently supports CPU and GPU targets and focuses on optimization of nested for-loops.
+# Welcome to Accera
+
+Accera is a compiler that enables you to experiment with loop optimizations without hand-writing Assembly code. Accera is available as a Python library and supports cross-compiling to a wide range of [processor targets](https://github.com/microsoft/Accera/blob/main/accera/python/accera/Targets.py).
 
 Writing highly optimized compute-intensive code in a traditional programming language is a difficult and time-consuming process. It requires special engineering skills, such as fluency in Assembly language and a deep understanding of computer architecture. Manually optimizing the simplest numerical algorithms already requires a significant engineering effort. Moreover, highly optimized numerical code is prone to bugs, is often hard to read and maintain, and needs to be reimplemented every time a new target architecture is introduced. Accera aims to solve these problems.
 
@@ -27,98 +29,143 @@ See the [Install Instructions](https://microsoft.github.io/Accera/Install/) for
 
 ### Quickstart
 
-#### Try Accera in your browser
+In this example, we will:
 
-[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/Accera/HEAD?labpath=docs%2Fdemos%2Fbinder%2Fquickstart.ipynb)
+* Implement matrix multiplication with a ReLU activation (matmul + ReLU), commonly used in in machine learning algorithms
+  * Generate two implementations: a naive algorithm and one with loop transformations
+* Compare the timings of both implementations
 
-No installation required.
+#### Run in your browser
 
-#### Run Accera on your local machine
+[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/Accera/main?labpath=docs%2Fdemos%2Fquickstart.ipynb)
 
-In this quickstart example, you will:
+No installation is required. This will launch a Jupyter notebook with the quickstart example running in the cloud.
 
-* Implement a simple `hello_accera` function that performs basic matrix multiplication with a ReLU activation
-* Build a [HAT](https://github.com/microsoft/hat) package with a dynamic (shared) library that exports this function
-* Call the `hello_accera` function in the dynamic library with some NumPy arrays, and checks against a NumPy implementation
+#### Run on your machine
 
-1. Create a Python 3 script called `quickstart.py`
+1. Create a Python 3 script called `quickstart.py`:
 
-```python
-import accera as acc
-import hatlib as hat
-import numpy as np
+    ```python
+    import accera as acc
 
-A = acc.Array(role=acc.Array.Role.INPUT, shape=(16, 16))
-B = acc.Array(role=acc.Array.Role.INPUT, shape=(16, 16))
-C = acc.Array(role=acc.Array.Role.INPUT_OUTPUT, shape=(16, 16))
+    # define placeholder inputs/output
+    A = acc.Array(role=acc.Array.Role.INPUT, shape=(512, 512))
+    B = acc.Array(role=acc.Array.Role.INPUT, shape=(512, 512))
+    C = acc.Array(role=acc.Array.Role.INPUT_OUTPUT, shape=(512, 512))
 
-matmul = acc.Nest(shape=(16, 16, 16))
-i1, j1, k1 = matmul.get_indices()
+    # implement the logic for matmul and relu
+    matmul = acc.Nest(shape=(512, 512, 512))
+    i1, j1, k1 = matmul.get_indices()
+    @matmul.iteration_logic
+    def _():
+        C[i1, j1] += A[i1, k1] * B[k1, j1]
 
-@matmul.iteration_logic
-def _():
-    C[i1, j1] += A[i1, k1] * B[k1, j1]
+    relu = acc.Nest(shape=(512, 512))
+    i2, j2 = relu.get_indices()
+    @relu.iteration_logic
+    def _():
+        C[i2, j2] = acc.max(C[i2, j2], 0.0)
 
-relu = acc.Nest(shape=(16, 16))
-i2, j2 = relu.get_indices()
+    package = acc.Package()
 
-@relu.iteration_logic
-def _():
-    C[i2, j2] = acc.max(C[i2, j2], 0.0)
+    # fuse the i and j indices of matmul and relu, add to the package
+    schedule = acc.fuse(matmul.create_schedule(), relu.create_schedule(), partial=2)
+    package.add(schedule, args=(A, B, C), base_name="matmul_relu_fusion_naive")
 
-matmul_schedule = matmul.create_schedule()
-relu_schedule = relu.create_schedule()
+    # transform the schedule, add to the package
+    f, i, j, k = schedule.get_indices()
+    ii, jj = schedule.tile((i, j), (16, 16)) # loop tiling
+    schedule.reorder(j, i, f, k, jj, ii) # loop reordering
+    plan = schedule.create_plan()
+    plan.unroll(ii) # loop unrolling
+    package.add(plan, args=(A, B, C), base_name="matmul_relu_fusion_transformed")
 
-# fuse the first 2 indices of matmul and relu
-schedule = acc.fuse(matmul_schedule, relu_schedule, partial=2)
+    # build a dynamically-linked package (a .dll or .so) that exports both functions
+    print(package.build(name="hello_accera", format=acc.Package.Format.HAT_DYNAMIC))
+    ```
 
-package = acc.Package()
-package.add(schedule, args=(A, B, C), base_name="hello_accera")
+2. Ensure that you have a compiler in your PATH:
 
-# build a dynamically-linked HAT package
-package.build(name="mypackage", format=acc.Package.Format.HAT_DYNAMIC)
+    * Windows: Install Microsoft Visual Studio and run `vcvars64.bat` to setup the command prompt
+    * Linux/macOS: Install gcc
 
-# load the package and call the function with random test input
-hat_package = hat.load("mypackage.hat")
-hello_accera = hat_package["hello_accera"]
+    Don't have a compiler handy? We recommend trying Accera in your browser instead [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/microsoft/Accera/main?labpath=docs%2Fdemos%2Fquickstart.ipynb).
 
-A_test = np.random.rand(16, 16).astype(np.float32)
-B_test = np.random.rand(16, 16).astype(np.float32)
-C_test = np.zeros((16, 16)).astype(np.float32)
 
-# compute using NumPy as a comparison
-C_np = np.maximum(C_test + A_test @ B_test, 0)
+3. Install Accera:
 
-hello_accera(A_test, B_test, C_test)
+    ```shell
+    pip install accera
+    ```
 
-# compare the result with NumPy
-np.testing.assert_allclose(C_test, C_np)
-print(C_test)
-print(C_np)
-```
+4. Generate the library that implements two versions of matmul + ReLU:
 
-2. Ensure that you have a compiler in your PATH:
+    ```shell
+    python quickstart.py
+    ```
 
-    * Windows: Install Microsoft Visual Studio and run `vcvars64.bat` to setup the command prompt
-    * Linux/macOS: Install gcc
+5. To consume and compare the library functions, create a file called `benchmark.py` in the same location:
 
-3. Install Accera:
+    ```python
+    import hatlib as hat
+    import numpy as np
 
-```shell
-pip install accera
-```
+    # load the package
+    hat_package = hat.load("hello_accera.hat")
 
-4. Run the Python script:
+    # call one of the functions with test inputs
+    A_test = np.random.rand(512, 512).astype(np.float32)
+    B_test = np.random.rand(512, 512).astype(np.float32)
+    C_test = np.zeros((512, 512)).astype(np.float32)
+    C_numpy = np.maximum(C_test + A_test @ B_test, 0.0)
 
-```python
-python quickstart.py
-```
+    matmul_relu = hat_package["matmul_relu_fusion_transformed"]
+    matmul_relu(A_test, B_test, C_test)
+
+    # check correctness
+    np.testing.assert_allclose(C_test, C_numpy, atol=1e-3)
+
+    # benchmark all functions
+    hat.run_benchmark("hello_accera.hat", batch_size=5, min_time_in_sec=5)
+    ```
+
+6. Run the benchmark to get the timing results:
+
+    ```shell
+    python benchmark.py
+    ```
 
 #### Next Steps
 
-The function can be optimized using [schedule transformations](https://microsoft.github.io/Accera/Manual/03%20Schedules/#schedule-transformations). The [Manual](https://microsoft.github.io/Accera/Manual/00%20Introduction/) is a good place to start for an introduction to the Accera programming model.
+The [Manual](https://microsoft.github.io/Accera/Manual/00%20Introduction/) is a good place to start for an introduction to the Accera Python programming model.
+
+In particular, the [schedule transformations](https://microsoft.github.io/Accera/Manual/03%20Schedules/#schedule-transformations) describe how you can experiment with different loop transformations with just a few lines of Python.
+
+Finally, the `.hat` format is just a C header file containing metadata. Learn more about the [HAT format](https://github.com/microsoft/hat) and [benchmarking](https://github.com/microsoft/hat/tree/main/tools).
+
+
+## How it works
+
+In a nutshell, Accera takes the Python code that defines the loop schedule and algorithm and converts it into [MLIR](https://mlir.llvm.org/) intermediate representation (IR). Accera's compiler then takes this IR through a series of MLIR pipelines to perform transformations. The result is a binary library with a C header file. The library implements the algorithms that are defined in Python, and is compatible with the target.
+
+To peek into the stages of IR transformation that Accera does, try replacing `format=acc.Package.Format.HAT_DYNAMIC` with `format=acc.Package.Format.MLIR_DYNAMIC` in `quickstart.py`, re-run the script, and search the `_tmp` subfolder for the intermediate `*.mlir` files. We plan to document these IR constructs in the future.
 
 ## Documentation
-Get to know Accera by reading the [Documentation](https://microsoft.github.io/Accera/).
 
-You can find more step-by-step examples in the [Tutorials](https://microsoft.github.io/Accera/Tutorials).
+Get to know Accera's concepts and Python constructs in the [Documentation](https://microsoft.github.io/Accera/) page.
+
+## Tutorials
+
+More step-by-step examples are available on the [Tutorials](https://microsoft.github.io/Accera/Tutorials) page. We're working on more examples and tutorials soon.
+
+## Contributions
+
+Accera is a research platform-in-progress. We would love your contributions, feedback, questions, and feature requests! Please file a [Github issue](https://github.com/microsoft/Accera/issues/new) or send us a pull request. Please review the [Microsoft Code of Conduct](https://opensource.microsoft.com/codeofconduct/) to learn more.
+
+## Credits
+
+Accera is built using several open source libraries, including: [LLVM](https://llvm.org/), [toml++](https://marzer.github.io/tomlplusplus/), [tomlkit](https://github.com/sdispater/tomlkit), [vcpkg](https://vcpkg.io/en/index.html), [pyyaml](https://pyyaml.org/), and [HAT](https://github.com/microsoft/hat). For testing, we also use [numpy](https://github.com/numpy/numpy) and [catch2](https://github.com/catchorg/Catch2).
+
+## License
+
+This project is released under the [MIT License](https://github.com/microsoft/Accera/blob/main/LICENSE).
diff --git a/accera/acc-gpu-runner/README.md b/accera/acc-gpu-runner/README.md
@@ -1,7 +1,7 @@
 # acc-gpu-runner
 
 The `acc-gpu-runner` tool is functionally a wrapper around `acc-opt` and `mlir-vulkan-runner`.
-It takes in a Accera-emitted MLIR file produced by a Accera generator and does the following:
+It takes in an Accera-emitted MLIR file produced by an Accera generator and does the following:
 - Runs the Accera lowering passes like `acc-opt` does
 - Runs the GPU and Vulkan passes that `mlir-vulkan-runner` does
 - Runs the lowered MLIR code in a GPU JIT engine

diff --git a/accera/accc/README.md b/accera/accc/README.md
@@ -111,14 +111,14 @@ Generating for the sample in `samples/GEMM/MLAS_value/Accera_Sample.cpp`:
 
 The above invocation will:
 1. Create a directory `mlas_value_sample`
-1. Create a subdirectory `mlas_value_sample/generator` and make a Accera generator CMake project there with the given Accera DSL file.
+1. Create a subdirectory `mlas_value_sample/generator` and make an Accera generator CMake project there with the given Accera DSL file.
 1. Build the generator
 1. Run the generator with the given domain csv and custom argument values from the given config file.
 1. Run `acc-opt.exe`, `mlir-translate.exe`, `llc.exe`, and `opt.exe` lowering the emitted code to a header and object file.
 1. Create a subdirectory `mlas_value_sample/mlas_value_sample_lib_intermediate` and put intermediate IR files there that are the result of running the generator, `acc-opt.exe`, `mlir-translate.exe`, `llc.exe`, and `opt.exe`, which include the final header for the Accera sample.
 1. Create a subdirectory `mlas_value_sample/lib` containing the project for the static library for the Accera sample.
 1. Create a subdirectory `mlas_value_sample/logs` and put the `stdout` and `stderr` logs for each phase there.
-1. (Because the `--main` argument was provided) Create a subdirectory `mlas_value_sample/main` and make a Accera main CMake project there with the given Accera main file and build the project.
+1. (Because the `--main` argument was provided) Create a subdirectory `mlas_value_sample/main` and make an Accera main CMake project there with the given Accera main file and build the project.
 1. (Because the `--run` argument was provided) Run the build main project.
 
 Note: the intermediate files and the generator and runner projects will be named based on the `--library_name` parameter

diff --git a/accera/hat/include/HATEmitter.h b/accera/hat/include/HATEmitter.h
@@ -22,15 +22,15 @@ template <typename StreamType>
 void EnableTOML(StreamType& os)
 {
     os << "\n";
-    os << "#ifdef __TOML__";
+    os << "#ifdef TOML";
     os << "\n";
 }
 
 template <typename StreamType>
 void DisableTOML(StreamType& os)
 {
     os << "\n";
-    os << "#endif // __TOML__";
+    os << "#endif // TOML";
     os << "\n";
 }
 

diff --git a/accera/onnx-emitter/onnx_emitter.py b/accera/onnx-emitter/onnx_emitter.py
@@ -142,15 +142,16 @@ def load_model(model_file):
 
 def get_target(target_name):
     if target_name == 'pi4':
-        return Target(model=Target.Model.RASPBERRY_PI4)
+        return Target(Target.Model.RASPBERRY_PI_4B, category=Target.Category.CPU)
     elif target_name == 'pi3':
-        return Target(model=Target.Model.RASPBERRY_PI3)
+        return Target("Raspberry Pi 3B", category=Target.Category.CPU)
     else:
         return Target.HOST
 
 
 def get_target_options(target):
-    if target.model == Target.Model.RASPBERRY_PI3:
+    if "Raspberry Pi" in target.name:
+        # TODO: Make use of the different attributes between the Pi devices
         return MLASOptions(KUnroll=2,
                            BCacheSizeThreshold=64**1,
                            NumRowsInKernel=2,

diff --git a/accera/onnx-emitter/test/pi3/emit_hat_package.py b/accera/onnx-emitter/test/pi3/emit_hat_package.py
@@ -218,9 +218,9 @@ def _emit_hat_package_for_model(model, package_name, target, output_dir, large_m
         model = onnx.load(model)
 
     if target == "pi4":
-        target_device = Target(model=Target.Model.RASPBERRY_PI4)
+        target_device = Target("Raspberry Pi 4B", category=Target.Category.CPU)
     elif target == "pi3":
-        target_device = Target(model=Target.Model.RASPBERRY_PI3)
+        target_device = Target("Raspberry Pi 3B", category=Target.Category.CPU)
     elif target == "host":
          target_device = Target.HOST