Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit aa38f90099653e4662e053fd3ab08248d3612ef9
Author: Mason Remy <masonr@microsoft.com>
Date:   Fri Mar 10 06:13:20 2023 +0000

    Merged PR 3150: Change high precision fp to not perform contraction

    Change high precision fp to not perform contraction

    Also change value library FMA to use the math dialect FmaOp and
    vectorize to the vector dialect FMAOp

commit 859755f7bbf76fb6b0b92ed7a7dc6cf5c1615ba1
Author: Mason Remy <masonr@microsoft.com>
Date:   Thu Mar 9 19:17:58 2023 +0000

    Merged PR 3147: Fix vector cast with same bitwidth.

    Fix vector cast with same bitwidth.

    accv.cast vector<16xi8> to vector<16xui8>
    was erroneously lowering to
    cast vector<16xi8> to ui8

commit d6b3308d0f4a4dbda7a30d0695d7408dfd9d32b9
Author: Mason Remy <masonr@microsoft.com>
Date:   Thu Mar 9 18:26:57 2023 +0000

    Merged PR 3149: Improve 1-D horizontal sum reductions for 8xf32 and 8xi32

    Improve 1-D horizontal sum reductions for 8xf32 and 8xi32

commit cd030b123dac7b25b2da5127e04e4e28919bd9ed
Author: Kern Handa <kerha@microsoft.com>
Date:   Thu Mar 9 01:22:37 2023 +0000

    Merged PR 3148: Adds Package level FP precision override

commit dc86c7cd92530c5e0c36639b62b5a380c531648d
Author: Kern Handa <kerha@microsoft.com>
Date:   Wed Mar 8 22:02:25 2023 +0000

    Merged PR 3144: Removes fp precision as an option for Package.build

    The fp-contract option being used in `accc.py` was overriding the recent addition of the fp precision specification at the function level. Since there's now an equivalent default for each function, we shouldn't have need of the option to be specified to `llc` and `opt` during build time.

commit 91e77ebcd926fb238852c94477d7f0c26c8f9952
Author: Denny Sun <dennys@microsoft.com>
Date:   Wed Mar 8 11:48:06 2023 +0000

    Merged PR 3143: Add dsl test for profiling op

    1. add profiling enable flag to Package.build()
    2. add a dsl test

commit 33ffb2497e71040e0b27775e5b594cad0949cbe5
Author: Denny Sun <dennys@microsoft.com>
Date:   Wed Mar 8 09:27:24 2023 +0000

    Merged PR 3022: Assert the arg order in debug mode

    Dimension arg should precede array arg in the arg list for debug mode.

commit e3b216ac87e6d75a10c39584d7dfe25b3fc67647
Author: Denny Sun <dennys@microsoft.com>
Date:   Wed Mar 8 08:12:00 2023 +0000

    Merged PR 3137: expose profiling function to DSL

    expose profiling function to DSL

commit d2fcb1caf99c002a25b1ee8c28b3cd719fd6133a
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Tue Mar 7 12:09:08 2023 +0000

    Merged PR 3142: [Release] Tie accera-llvm versioning to LLVM version

    This change introduces a new versioning schema for accera-llvm that follows LLVM's versioning, while allowing for Accera versioned forks:

    `<llvm_major>.<llvm_minor>.<llvm_micro><accera_micro> = (N+).(N+).(N+)(N{2})`

    This overloads the micro version field due to constraints on Python versioning: https://peps.python.org/pep-0440/

    Examples:

    * Current LLVM fork is 14.0.6-2: `accera_llvm.14.0.602`, which means LLVM 14.0.6 + accera fork v2
    * If/when upgrading to LLVM 15.0.7: `accera_llvm.15.0.700`
    * Then when we rev the Accera fork to LLVM 15.0.7-1: `accera_llvm.15.0.701`

    Limitations:
    * We don't expect Accera's fork to span beyond 2-digit versions

    Alternatives:
    * Omit the 0 delimiters, if we think it is unlikely that Accera forks will rev micro versions beyond single-digit. Accera forks may rev more often if we don't update LLVM.
    * Use a dev version, e.g. accera_llvm.14.0.6.dev4. Downside is that this looks unofficial - devN is intended for developmental releases rather than official PyPI releases. That said, the whole Accera project is developmental  :)

commit 79ef63b685e8a2221b3127801dec324f0613fa66
Author: Kern Handa <kerha@microsoft.com>
Date:   Tue Mar 7 09:45:06 2023 +0000

    Merged PR 3139: Allows setting precision of fp ops per function

    Allows setting precision of fp ops per function

commit 599742a82910cdc39b5e668f28774327f53a28c7
Author: Mason Remy <masonr@microsoft.com>
Date:   Mon Mar 6 21:31:09 2023 +0000

    Merged PR 3140: Fix bug with reinterpret casts of unrealized conversion casts.

    Fix bug with reinterpret casts of unrealized conversion casts.

    This happens when we do a heap alloc followed by a reinterpret cast, but
    it can come up in other scenarios too

commit 655044a3400a4b40aaf9423abc791763c079f410
Author: Lisa Ong <onglisa@microsoft.com>
Date:   Fri Mar 3 06:15:31 2023 +0000

    Merged PR 3135: [nfc] Add XeonE5 benchmark machine to targets, bump hatlib dependency

    Best guesses at cache sizes and cache lines from: https://en.wikichip.org/wiki/intel/xeon_e5/e5-2673_v4
  • Loading branch information
Lisa Ong committed Mar 10, 2023
1 parent 05f8c0d commit 5ebe6c7
Show file tree
Hide file tree
Showing 25 changed files with 553 additions and 91 deletions.
2 changes: 1 addition & 1 deletion accera/acc-opt/test/ValueBinOpCastOp.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ module @test_bin_op_cast_op_folding_module {
// CHECK-NEXT: %0 = affine.load %arg0[0] : memref<1xf32>
// CHECK-NEXT: %1 = affine.load %arg1[0] : memref<1xi32>
// CHECK-NEXT: %2 = arith.sitofp %1 : i32 to f32
// CHECK-NEXT: %3 = arith.mulf %0, %2 {RelaxedPrecision} : f32
// CHECK-NEXT: %3 = arith.mulf %0, %2 : f32
// CHECK-NEXT: %4 = arith.fptosi %3 : f32 to i32
// CHECK-NEXT: affine.store %4, %arg2[0] : memref<1xi32>
builtin.func @bin_op_cast_input_to_f32(%arg0: memref<1xf32>, %arg1: memref<1xi32>, %arg2: memref<1xi32>) {
Expand Down
16 changes: 8 additions & 8 deletions accera/acc-opt/test/value_mlir_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -553,15 +553,15 @@ TEST_CASE("mlir_test13")
// CHECK-NEXT: accv.module "test_emit_c_interface" {
TEST_CASE("test_emit_c_interface")
{
// CHECK-NEXT: accv.func nested @external_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.usages = [], exec_target = 0 : i64, llvm.emit_c_interface} {
// CHECK-NEXT: accv.func nested @external_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.usages = [], exec_target = 0 : i64, fastmath = #llvm.fastmath<fast>, llvm.emit_c_interface} {
auto externDecl = DeclareFunction("external_func_decl")
.External(true)
.CWrapper(true)
// CHECK: return
// CHECK-NEXT: }
.Define([] {});

// CHECK-NEXT: accv.func nested @internal_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.usages = [], exec_target = 0 : i64, llvm.emit_c_interface} {
// CHECK-NEXT: accv.func nested @internal_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.usages = [], exec_target = 0 : i64, fastmath = #llvm.fastmath<fast>, llvm.emit_c_interface} {
DeclareFunction("internal_func_decl")
.External(false)
.CWrapper(true)
Expand All @@ -577,15 +577,15 @@ TEST_CASE("test_emit_c_interface")
// CHECK-NEXT: accv.module "test_raw_pointer_api" {
TEST_CASE("test_raw_pointer_api")
{
// CHECK-NEXT: accv.func nested @external_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.emit_raw_pointer_api, accv.usages = [], exec_target = 0 : i64} {
// CHECK-NEXT: accv.func nested @external_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.emit_raw_pointer_api, accv.usages = [], exec_target = 0 : i64, fastmath = #llvm.fastmath<fast>} {
auto externDecl = DeclareFunction("external_func_decl")
.External(true)
.RawPointerAPI(true)
// CHECK: return
// CHECK-NEXT: }
.Define([] {});

// CHECK-NEXT: accv.func nested @internal_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.emit_raw_pointer_api, accv.usages = [], exec_target = 0 : i64} {
// CHECK-NEXT: accv.func nested @internal_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.emit_raw_pointer_api, accv.usages = [], exec_target = 0 : i64, fastmath = #llvm.fastmath<fast>} {
DeclareFunction("internal_func_decl")
.External(false)
.RawPointerAPI(true)
Expand All @@ -601,15 +601,15 @@ TEST_CASE("test_raw_pointer_api")
// CHECK-NEXT: accv.module "test_emit_header_decl" {
TEST_CASE("test_emit_header_decl")
{
// CHECK-NEXT: accv.func nested @external_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.emit_header_decl, accv.usages = [], exec_target = 0 : i64} {
// CHECK-NEXT: accv.func nested @external_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.emit_header_decl, accv.usages = [], exec_target = 0 : i64, fastmath = #llvm.fastmath<fast>} {
auto externDecl = DeclareFunction("external_func_decl")
.External(true)
.HeaderDecl(true)
// CHECK: return
// CHECK-NEXT: }
.Define([] {});

// CHECK-NEXT: accv.func nested @internal_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.emit_header_decl, accv.usages = [], exec_target = 0 : i64} {
// CHECK-NEXT: accv.func nested @internal_func_decl_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.emit_header_decl, accv.usages = [], exec_target = 0 : i64, fastmath = #llvm.fastmath<fast>} {
DeclareFunction("internal_func_decl")
.External(false)
.HeaderDecl(true)
Expand All @@ -625,13 +625,13 @@ TEST_CASE("test_emit_header_decl")
// CHECK-NEXT: accv.module "test_function_tags" {
TEST_CASE("test_function_tags")
{
// CHECK-NEXT: accv.func nested @no_func_tags_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.usages = [], exec_target = 0 : i64} {
// CHECK-NEXT: accv.func nested @no_func_tags_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.usages = [], exec_target = 0 : i64, fastmath = #llvm.fastmath<fast>} {
auto externDecl = DeclareFunction("no_func_tags")
// CHECK: return
// CHECK-NEXT: }
.Define([] {});

// CHECK-NEXT: accv.func nested @has_func_tags_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.function_tags = {tag_a, tag_b}, accv.usages = [], exec_target = 0 : i64} {
// CHECK-NEXT: accv.func nested @has_func_tags_{{[0-9]+}}() attributes {accv.dyn_arg_size_refs = [], accv.function_tags = {tag_a, tag_b}, accv.usages = [], exec_target = 0 : i64, fastmath = #llvm.fastmath<fast>} {
DeclareFunction("has_func_tags")
.AddTag("tag_a")
.AddTag("tag_b")
Expand Down
18 changes: 1 addition & 17 deletions accera/accc/accc.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,6 @@ def bstr(val):

DEFAULT_MLIR_TRANSLATE_ARGS = ["--mlir-print-op-on-diagnostic", "--acc-to-llvmir"]

DEFAULT_LOW_PRECISION_FLOAT_OPTS = ["-fp-contract=fast", "--enable-unsafe-fp-math"]
DEFAULT_HIGH_PRECISION_FLOAT_OPTS = ["-fp-contract=on"]

OPT_DISABLE_LOOP_UNROLLING_ARGS = ["--disable-loop-unrolling"]

LLVM_KEEP_DEBUG_INFO_ARGS = ["--frame-pointer=all"]
Expand All @@ -128,20 +125,15 @@ def bstr(val):
}

DEFAULT_LLVM_TOOLING_OPTS = [
'--enable-no-infs-fp-math',
'--enable-no-nans-fp-math',
'--enable-no-signed-zeros-fp-math',
'--enable-no-trapping-fp-math'
]

DEFAULT_OPT_ARGS = DEFAULT_LLVM_TOOLING_OPTS + []

DEFAULT_LLC_ARGS = DEFAULT_LLVM_TOOLING_OPTS + ["-relocation-model=pic"]

class Options(Flag):
NONE = auto() # (enable auto unroll | low precision float | no debug info)
NONE = auto() # (enable auto unroll | no debug info)
DISABLE_AUTO_UNROLL = auto()
HIGH_PRECISION_FLOATING_POINT_OPS = auto()
KEEP_DEBUG_INFO = auto()

def _get_common_debug_info_options_args(options: Options):
Expand All @@ -150,27 +142,19 @@ def _get_common_debug_info_options_args(options: Options):
else:
return []

def _get_common_fp_options_args(options: Options):
if options & Options.HIGH_PRECISION_FLOATING_POINT_OPS:
return DEFAULT_HIGH_PRECISION_FLOAT_OPTS
else:
return DEFAULT_LOW_PRECISION_FLOAT_OPTS

def _get_options_opt_args(options: Options):
args = []

if options & Options.DISABLE_AUTO_UNROLL:
args += OPT_DISABLE_LOOP_UNROLLING_ARGS

args += _get_common_fp_options_args(options)
args += _get_common_debug_info_options_args(options)

return args

def _get_options_llc_args(options: Options):
args = []

args += _get_common_fp_options_args(options)
args += _get_common_debug_info_options_args(options)

return args
Expand Down
16 changes: 16 additions & 0 deletions accera/python/accera/Debug.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,22 @@
from ._lang_python._lang import Dimension


def check_args_order(func: Function):
try:
for arg in func.requested_args:
if isinstance(arg, Array):
for dim in arg.shape:
if isinstance(dim, Dimension):
assert func.requested_args.index(dim) < func.requested_args.index(arg)
except Exception as e:
if isinstance(e, AssertionError):
assert False, "Dimension arguments need to precede the array argument in Debug mode"
else:
# Swallow the exception in this function when the array's dimension is absent from the arg list,
# let this function only focus on the arg order check.
return


def get_args_to_debug(func: Function) -> List[Array]:
"""Gets the arguments of interest to debugging
For example, INPUT_OUTPUT Arrays
Expand Down
12 changes: 9 additions & 3 deletions accera/python/accera/Package.py
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ def _add_functions_to_module(self, module, fail_on_error=False):
del self._fns[name]

def _add_debug_utilities(self, tolerance):
from .Debug import get_args_to_debug, add_debugging_functions
from .Debug import get_args_to_debug, add_debugging_functions, check_args_order

# add_check_all_close will modify the self._fns dictionary (because
# it is adding debug functions), to avoid this, we first gather information
Expand Down Expand Up @@ -576,10 +576,13 @@ def _make_accc_options(self, options: _Options):
accc_opts = accc.Options.NONE
if options & Package._Options.DISABLE_AUTO_UNROLL:
accc_opts |= accc.Options.DISABLE_AUTO_UNROLL
if options & Package._Options.HIGH_PRECISION_FLOATING_POINT_OPS:
accc_opts |= accc.Options.HIGH_PRECISION_FLOATING_POINT_OPS
return accc_opts

def _apply_options_to_funcs(self, options: _Options):
if options & Package._Options.HIGH_PRECISION_FLOATING_POINT_OPS:
for f in self._fns.values():
if f.high_precision_fp is None:
f.high_precision_fp = True

def build(
self,
Expand All @@ -590,6 +593,7 @@ def build(
tolerance: float = 1e-5,
output_dir: str = None,
fail_on_error: bool = False,
profile: bool = False,
_opts: _Options = _Options.NONE,
_quiet=True,
):
Expand Down Expand Up @@ -653,6 +657,7 @@ def build(

# Create the package module
package_module = _lang_python._Module(name=name, options=compiler_options)
self._apply_options_to_funcs(_opts)
self._add_functions_to_module(package_module, fail_on_error)

# Emit the package module
Expand Down Expand Up @@ -705,6 +710,7 @@ def build(

proj.generate_and_emit(
build_config=mode.value,
profile=profile,
system_target=target_device.device_name,
runtime=target.runtime.name,
dump_all_passes=dump_ir,
Expand Down
4 changes: 4 additions & 0 deletions accera/python/accera/Targets.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,10 @@ class Architecture(Enum):
["Intel E5-1680 v3", "Haswell", "Xeon E5", 3.2, 3.8, 8, 16, [48, 256, 20 * 1024], [64, 64, 64], 32, 16, ["SSE4.1", "SSE4.2", "AVX2"], "X86_64", "OPENMP"],
["Intel E5-2620 v3", "Haswell", "Xeon E5", 2.4, 3.2, 6, 12, [48, 256, 15 * 1024], [64, 64, 64], 32, 16, ["SSE4.1", "SSE4.2", "AVX2"], "X86_64", "OPENMP"],

# Intel Broadwell
# ref: https://en.wikichip.org/wiki/intel/xeon_e5/e5-2673_v4
["Intel E5-2673 v4", "Broadwell", "Xeon E5", 2.3, 2.6, 20, 40, [20, 20, 20], [32, 256, 2.5*1024], 32, 16, ["SSE4.1", "SSE4.2", "AVX2"], "X86_64", "OPENMP"],

# AMD Zen
# ref: https://en.wikipedia.org/wiki/Zen_(first_generation)
# ref: https://en.wikichip.org/wiki/amd/microarchitectures/zen
Expand Down
2 changes: 2 additions & 0 deletions accera/python/accera/lang/Function.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ class Function:
definition: Callable = None
no_inline: bool = False # no_inline == True means that this function cannot be inlined into other functions
no_inline_into: bool = False # no_inline_into == True means that this function cannot have other functions inlined into it
high_precision_fp: bool = None # high_precision_fp == True means that precision will not be sacrificed for performance
auxiliary: dict = field(default_factory=dict)
target: Target = Target.HOST
output_verifiers: list = field(default_factory=list)
Expand Down Expand Up @@ -102,6 +103,7 @@ def _emit(self):

self._native_fn.inlinable(not self.no_inline)
self._native_fn.inlinable_into(not self.no_inline_into)
self._native_fn.high_precision_fp(bool(self.high_precision_fp))

sig = signature(self.definition)

Expand Down
Loading

0 comments on commit 5ebe6c7

Please sign in to comment.