-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
238 lines (174 loc) · 8.63 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
clFFT++, a C++ wrapper for the clFFT library
Copyright 2016, Malcolm Roberts
malcolmiwroberts.com malcolmiwroberts@gmail.com
clFFT++ is a C++ header file for the clFFT fast Fourier transform
(FFT) library available at https://github.com/clMathLibraries/clFFT
clFFT is a collection of functions written in C which generates OpenCL
code which can then be run on a large variety of hardware, for example
CPUs, GPUs, and co-processor boards. the clFFT++ library makes the use
of clFFT much easier by wrapping the various set-up and tear-down functions
into an object-oriented setting.
For example, to create a 1D complex-to-complex FFT using clFFT++ requires one
line:
clfft1 fft(nx, inplace, queue, ctx);
where nx is the problem size, inplace is a bool which determines
whether the transform is in-place or out-of-place, and queue and ctx
are the OpenCL context and queues as per normal. Calling this fft is done
using the command
fft.forward(&inbuf, &outbuf, nwait, wait, done);
where inbuf and outbuf are cl_mem buffers of the appropriate size, and
nwait, wait, and done are the usual OpenCL event data.
************* Examples *************
Examples are available in the examples directory, along with some
small utility files (platform.cpp, platform.hpp, clutils.c, clutils.h,
and utils.hpp) which help manage output and setting up the OpenCL
environment. The example files are
examples/fft1:
complex-to-complex 1D FFT
examples/fft2:
complex-to-complex 2D FFT
examples/fft3:
complex-to-complex 3D FFT
examples/fft1r:
real-to-complex 1D FFT
examples/fft2r:
real-to-complex 2D FFT
examples/fft3r:
real-to-complex 3D FFT
examples/mfft1:
multiple 1D complex-to-complex FFTs
examples/mfft1r:
multiple 1D real-to-complex FFTs
The environment variables OPENCL_INCLUDE_PATH and OPENCL_LIB_PATH
allow one to include and link OpenCL in non-standard directories,
while CLFFT_INCLUDE_PATH and CLFFT_LIB_PATH perform the role for
clFFT.
************* Tests *************
The tests directory has similar files, but checks the results agains
FFTW using the fftw++ library (fftwp.sf.net) for accuracy. Timing
tests are also available, and there are a variety of python scripts
for performing timing tests and verifying output. The environment
variables FFTW_INCLUDE_PATH and FFTW_LIB_PATH specify the location of
FFTW (available at fftw.org), while FFTWPP_INCLUDE_PATH and
FFTWPP_LIB_PATH specify the location of fftw++ (available at
fftwpp.sf.net).
************* Usage *************
Once an object is created, for example
clfft1 fft(nx, inplace, queue, ctx);
one performs forward FFTs by calling
fft.forward(&inbuf, &outbuf, nwait, wait, done);
If inbuf is equal to outbuf, the FFT is in-place. The constructor and
the FFT call must either both be in-place or both be out-of-place.
For in-place transforms, &outbuff may be set to NULL.
Backwards FFTs are performed by calling the analagous function
fft.backward(&inbuf, &outbuf, nwait, wait, done);
************* Classes / Constructors *************
clfft++.hpp contains the following object constructors:
1D complex-to-complex FFTS
unsigned int nx : The problem size
bool inplace : Specifies whether the FFT is in-place
cl_command_queue queue : The OpenCL command queue
cl_context ctx : The OpenCL context
clfft1(unsigned int nx, bool inplace,
cl_command_queue queue, cl_context ctx)
2D complex-to-complex FFTS
unsigned int nx : The problem size in the first dimension
unsigned int ny : The problem size in the second dimension
bool inplace : Specifies whether the FFT is in-place
cl_command_queue queue : The OpenCL command queue
cl_context ctx : The OpenCL context
clfft2(unsigned int nx, unsigned int ny, bool inplace,
cl_command_queue queue, cl_context ctx)
3D complex-to-complex FFTS
unsigned int nx : The problem size in the first dimension
unsigned int ny : The problem size in the second dimension
unsigned int nz : The problem size in the third dimension
bool inplace : Specifies whether the FFT is in-place
cl_command_queue queue : The OpenCL command queue
cl_context ctx : The OpenCL context
clfft3(unsigned int nx, unsigned int ny, unsigned int nz, bool inplace,
cl_command_queue queue, cl_context ctx)
1D real-to-complex and complex-to-real FFTs
unsigned int nx : The problem size in the first dimension
bool inplace : Specifies whether the FFT is in-place
cl_command_queue queue : The OpenCL command queue
cl_context ctx : The OpenCL context
The problem size nx is the number of real values before being
transformed into complex space. The output has nx / 2 + 1 complex
values.
clfft1r(unsigned int nx, bool inplace,
cl_command_queue queue, cl_context ctx)
2D real-to-complex and complex-to-real FFTs
unsigned int nx : The problem size in the first dimension
unsigned int ny : The problem size in the second dimension
bool inplace : Specifies whether the FFT is in-place
cl_command_queue queue : The OpenCL command queue
cl_context ctx : The OpenCL context
The input is nx * ny real values, and the output has
nx * nyp complex values, where nyp = ny / 2 + 1.
clfft2r(unsigned int nx, unsigned int ny, bool inplace,
cl_command_queue queue, cl_context ctx)
2D real-to-complex and complex-to-real FFTs
unsigned int nx : The problem size in the first dimension
unsigned int ny : The problem size in the second dimension
unsigned int nz : The problem size in the third dimension
bool inplace : Specifies whether the FFT is in-place
cl_command_queue queue : The OpenCL command queue
cl_context ctx : The OpenCL context
The input is nx * ny * nz real values, and the output has
nx * ny * nzp complex values, where nzp = nz / 2 + 1.
clfft3r(unsigned int nx, unsigned int ny, unsigned int nz, bool inplace,
cl_command_queue queue, cl_context ctx)
Multiple 1D complex-to-copmlex FFTs
unsigned int nx : The problem size in the first dimension
unsigned int M : The number of 1D FFTs to be performed
int istride : The stride between values in the input
int ostride : The stride between values in the output
int idist : The distance between the beginning of vectors in
the input
int odist : The distance between the beginning of vectors in
the output
bool inplace : Specifies whether the FFT is in-place
cl_command_queue queue : The OpenCL command queue
cl_context ctx : The OpenCL context
The input and output consist of nx * M complex values. The distance
and stride parameters can be chosen so that the FFTs are in either the
first or second dimension; see the examples for how to compute these
values.
clmfft1(unsigned int nx, unsigned int M,
int istride, int ostride, int idist, int odist,
bool inplace,
cl_command_queue queue, cl_context ctx) :
Multiple 1D real-to-copmlex and complex-to-real FFTs
unsigned int nx : The problem size in the first dimension
unsigned int M : The number of 1D FFTs to be performed
int istride : The stride between values in the input
int ostride : The stride between values in the output
int idist : The distance between the beginning of vectors in
the input
int odist : The distance between the beginning of vectors in
the output
bool inplace : Specifies whether the FFT is in-place
cl_command_queue queue : The OpenCL command queue
cl_context ctx : The OpenCL context
The input consists of nx * M real values, and the output consists of
nxp * M complex values, where nxp = nx / 2 + 1. The distance and
stride parameters can be chosen so that the FFTs are in either the
first or second dimension; see the examples for how to compute these
values.
clmfft1r(unsigned int nx, unsigned int M, int istride, int ostride,
int idist, int odist,
bool inplace,
cl_command_queue queue, cl_context ctx) :
************* License *************
This file is part of clFFT++.
clFFT++ is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
clFFT++ is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with clFFT++. If not, see <http://www.gnu.org/licenses/>.