Releases · ROCm/rocFFT

16 Feb 22:20

rocm-5.0.1

fb0d3f8

rocFFT code for ROCm 5.0.1 is unchanged from rocFFT for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.

Assets 2

09 Feb 21:45

lawruble13

rocm-5.0.0

fb0d3f8

rocFFT 1.0.15 for ROCm 5.0.0

Changed

Re-aligned split device library into 4 roughly equal libraries.
Implemented the FuseShim framework to replace the original OptimizePlan
Implemented the generic buffer-assignment framework. The buffer assignment
is no longer performed by each node. We designed a generic algorithm to
test and pick the best assignment path.
With the help of FuseShim, we can achieve more kernel-fusions as possible.
Do not read the imaginary part of the DC and Nyquist modes for even-length
complex-to-real transforms.

Optimizations

Optimized twiddle-conjugation; complex-to-complex inverse transforms should have similar performance to foward transforms now.
Improved performance of single-kernel small 2D transforms.

Assets 2

10 Dec 19:28

lawruble13

rocm-4.5.2

b021fc3

rocFFT 1.0.14 for ROCm 4.5.2

rocFFT code for ROCm 4.5.2 is unchanged from rocFFT for ROCm 4.5.0. The library was rebuilt for the updated ROCm 4.5.2 stack.

Assets 2

27 Oct 21:52

lawruble13

rocm-4.5.0

b021fc3

rocFFT 1.0.14 for ROCm 4.5.0

Changed

Packaging split into a runtime package called rocfft and a development package called rocfft-devel. The development package depends on runtime. The runtime package suggests the development package for all supported OSes except CentOS 7 to aid in the transition. The suggests feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.

Optimizations

Optimized SBCC kernels of length 52, 60, 72, 80, 84, 96, 104, 108, 112, 160,
168, 208, 216, 224, 240 with new kernel generator.
Improved many plans by removing unnecessary transpose steps.
Optimized scheme selection for 3D problems.
- Imposed less restrictions on 3D_BLOCK_RC selection. More problems can use 3D_BLOCK_RC and
  have some performance gain.
- Enabled 3D_RC. Some 3D problems with SBCC-supported z-dim can use less kernels and get benefit.
- Force --length 336 336 56 (dp) use faster 3D_RC to avoid it from being skipped by conservative
  threshold test.
Optimized some even-length R2C/C2R cases by doing more operations
in-place and combining pre/post processing into Stockham kernels.
Added radix-17.

Fixed

Fixed a few validation failures of even-length R2C inplace. 2D, 3D cubics sizes such as
100^2 (or ^3), 200^2 (or ^3), 256^2 (or ^3)...etc. We don't combine the three kernels
(stockham-r2c-transpose). We only combine two kernels (r2c-transpose) instead.
Improved large 1D transform decompositions.

Added

Added support for Windows 10 as a build target.
Added new kernel generator for select fused-2D transforms.

Assets 2

27 Aug 19:04

lawruble13

rocm-4.3.1

60206b6

rocFFT 1.0.12 for ROCm 4.3.1

Updated

documentation notes that callbacks are experimental

Assets 2

30 Jul 22:53

saadrahim

rocm-4.3.0

b93c40c

rocFFT 1.0.12 for ROCm 4.3.0

Changed

Re-split device code into single-precision, double-precision, and miscellaneous kernels.

Fixed

Fixed potential crashes in double-precision planar->planar transpose.

Added

Added new kernel generator for select lengths. New kernels have
improved performance.
Added public rocfft_execution_info_set_load_callback and
rocfft_execution_info_set_store_callback API functions to allow
executing extra logic when loading/storing data from/to global
memory during a transform.

Removed

Removed R2C pair schemes and kernels.

Optimizations

Optimized 2D/3D R2C 100 and 1D Z2Z 2500.
Reduced number of kernels for 2D/3D sizes where higher dimension is 64, 128, 256.

Fixed

Fixed potential crashes in 3D transforms with unusual strides, for
SBCC-optimized sizes.

Assets 2

10 May 23:17

saadrahim

rocm-4.2.0

a470ba6

rocFFT-1.0.11 for ROCm 4.2.0

Optimizations

Improved performance for single precision kernels exercising all except radix-2/7 butterfly ops.
Minor optimization for C2R 3D 100, 200 cube sizes.
Optimized some C2C/R2C 3D 64, 81, 100, 128, 200, 256 rectangular sizes.
When factoring, test to see if remaining length is explicitly supported.
Explicitly add radix-7 lengths 14, 21, and 224 to list of supported lengths.
Optimized R2C 2D/3D 128, 200, 256 cube sizes.

Fixed

Fixed potential crashes in small 3D transforms with unusual strides. (#311)
Fixed potential crashes when executing transforms on multiple devices. (#310)

Assets 2

23 Mar 01:18

saadrahim

rocm-4.1.0

c3110db

rocFFT-1.0.10 for ROCm 4.1.0

Added

Explicitly specify MAX_THREADS_PER_BLOCK through _launch_bounds for all kernels.
Switch to new syntax for specifying AMD GPU architecture names and features.

Optimizations

Optimized C2C/R2C 3D 64, 81, 100, 128, 200, 256 cube sizes.
Improved performance of the standalone out-of-place transpose kernel.
Optimized 1D length 40000 C2C case.
Enabled radix-7 for size 336.
New radix-11 and radix-13 kernels; used in length 11 and 13 (and some of their multiples) transforms.

Changed

rocFFT now automatically allocates a work buffer if the plan requires one but none is provided.
An explicit rocfft_status_invalid_work_buffer error is now returned when a work buffer of insufficient size is provided.
Updated online documentation.
Updated debian package name version with separated '_'.
Adjusted accuracy test tolerances and how they are compared.

Fixed

Fixed 4x4x8192 accuracy failure.

Known Issues

None

Assets 2

18 Dec 15:23

saadrahim

rocm-4.0.0

2d35fd6

rocFFT-1.0.8 for ROCm 4.0.0

New Features

No new features

Known Issues

None

Assets 2

30 Nov 17:02

saadrahim

rocm-3.10.0

2d35fd6

rocFFT-1.0.8 for ROCm 3.10.0

New Features

Added deprecation warning for hipfft.h
Optimized case 1D 10000 C2C
Fixed SBCC/SBRC non-unit stride batch issue
Updated README and added BUILD_CLIENTS_ALL
Improved test infrastructure

Known Issues

None

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed

Optimizations

Changed

Optimizations

Fixed

Added

Updated

Changed

Fixed

Added

Removed

Optimizations

Fixed

Optimizations

Fixed

Releases: ROCm/rocFFT

rocFFT 1.0.15 for ROCm 5.0.1

rocFFT 1.0.15 for ROCm 5.0.0

Changed

Optimizations

rocFFT 1.0.14 for ROCm 4.5.2

rocFFT 1.0.14 for ROCm 4.5.0

Changed

Optimizations

Fixed

Added

rocFFT 1.0.12 for ROCm 4.3.1

Updated

rocFFT 1.0.12 for ROCm 4.3.0

Changed

Fixed

Added

Removed

Optimizations

Fixed

rocFFT-1.0.11 for ROCm 4.2.0

Optimizations

Fixed

rocFFT-1.0.10 for ROCm 4.1.0

rocFFT-1.0.8 for ROCm 4.0.0

rocFFT-1.0.8 for ROCm 3.10.0