AcceleratedKernels v0.2.2
- Added N-dimensional accumulate! implementation
- Added second 1-dimensional accumulate! algorithm which does not need stronger device-wide synchronisation guarantees (which, notably, Apple Metal does not offer, and so decoupled-lookback cannot work on this platform).
- Added extension system with different defaults for accumulate on Metal and any/all on oneAPI. Now all corner cases are tested and work.
- Added higher-order arithmetics functions: sum, prod, minimum, maximum, count, cumsum, cumprod
- Added one final backend::Backend argument to all functions to allow dispatch on them even when the input array is not transferred to the given backend (e.g. allowing ranges on GPUs).
There are no breaking changes - the new interfaces are a strict superset of previous ones.
Merged pull requests:
- Explicitly-defined backends and possible extensions with different defaults per platform (#14) (@anicusan)
- Added new
ScanPrefix
accumulate algorithm (#15) (@anicusan)
Closed issues:
accumulate
on Metal sometimes fails due to weaker@synchronize
guarantees than on other platforms (#10)