Skip to content

v0.2.2

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 30 Dec 06:24
· 4 commits to main since this release
69462f3

AcceleratedKernels v0.2.2

Diff since v0.2.1

  • Added N-dimensional accumulate! implementation
  • Added second 1-dimensional accumulate! algorithm which does not need stronger device-wide synchronisation guarantees (which, notably, Apple Metal does not offer, and so decoupled-lookback cannot work on this platform).
    • Added extension system with different defaults for accumulate on Metal and any/all on oneAPI. Now all corner cases are tested and work.
  • Added higher-order arithmetics functions: sum, prod, minimum, maximum, count, cumsum, cumprod
  • Added one final backend::Backend argument to all functions to allow dispatch on them even when the input array is not transferred to the given backend (e.g. allowing ranges on GPUs).

There are no breaking changes - the new interfaces are a strict superset of previous ones.

Merged pull requests:

  • Explicitly-defined backends and possible extensions with different defaults per platform (#14) (@anicusan)
  • Added new ScanPrefix accumulate algorithm (#15) (@anicusan)

Closed issues:

  • accumulate on Metal sometimes fails due to weaker @synchronize guarantees than on other platforms (#10)