You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Changed
Packaging split into a runtime package called rocfft and a development package called rocfft-devel. The development package depends on runtime. The runtime package suggests the development package for all supported OSes except CentOS 7 to aid in the transition. The suggests feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.
Optimizations
Optimized SBCC kernels of length 52, 60, 72, 80, 84, 96, 104, 108, 112, 160,
168, 208, 216, 224, 240 with new kernel generator.
Improved many plans by removing unnecessary transpose steps.
Optimized scheme selection for 3D problems.
Imposed less restrictions on 3D_BLOCK_RC selection. More problems can use 3D_BLOCK_RC and
have some performance gain.
Enabled 3D_RC. Some 3D problems with SBCC-supported z-dim can use less kernels and get benefit.
Force --length 336 336 56 (dp) use faster 3D_RC to avoid it from being skipped by conservative
threshold test.
Optimized some even-length R2C/C2R cases by doing more operations
in-place and combining pre/post processing into Stockham kernels.
Added radix-17.
Fixed
Fixed a few validation failures of even-length R2C inplace. 2D, 3D cubics sizes such as
100^2 (or ^3), 200^2 (or ^3), 256^2 (or ^3)...etc. We don't combine the three kernels
(stockham-r2c-transpose). We only combine two kernels (r2c-transpose) instead.
Improved large 1D transform decompositions.
Added
Added support for Windows 10 as a build target.
Added new kernel generator for select fused-2D transforms.