Skip to content

rocFFT 1.0.14 for ROCm 4.5.0

Compare
Choose a tag to compare
@lawruble13 lawruble13 released this 27 Oct 21:52
b021fc3

Changed

  • Packaging split into a runtime package called rocfft and a development package called rocfft-devel. The development package depends on runtime. The runtime package suggests the development package for all supported OSes except CentOS 7 to aid in the transition. The suggests feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.

Optimizations

  • Optimized SBCC kernels of length 52, 60, 72, 80, 84, 96, 104, 108, 112, 160,
    168, 208, 216, 224, 240 with new kernel generator.
  • Improved many plans by removing unnecessary transpose steps.
  • Optimized scheme selection for 3D problems.
    • Imposed less restrictions on 3D_BLOCK_RC selection. More problems can use 3D_BLOCK_RC and
      have some performance gain.
    • Enabled 3D_RC. Some 3D problems with SBCC-supported z-dim can use less kernels and get benefit.
    • Force --length 336 336 56 (dp) use faster 3D_RC to avoid it from being skipped by conservative
      threshold test.
  • Optimized some even-length R2C/C2R cases by doing more operations
    in-place and combining pre/post processing into Stockham kernels.
  • Added radix-17.

Fixed

  • Fixed a few validation failures of even-length R2C inplace. 2D, 3D cubics sizes such as
    100^2 (or ^3), 200^2 (or ^3), 256^2 (or ^3)...etc. We don't combine the three kernels
    (stockham-r2c-transpose). We only combine two kernels (r2c-transpose) instead.
  • Improved large 1D transform decompositions.

Added

  • Added support for Windows 10 as a build target.
  • Added new kernel generator for select fused-2D transforms.