Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT of strided array #430

Open
btmit opened this issue Sep 16, 2020 · 5 comments
Open

FFT of strided array #430

btmit opened this issue Sep 16, 2020 · 5 comments
Labels
cuda array Stuff about CuArray. enhancement New feature or request

Comments

@btmit
Copy link

btmit commented Sep 16, 2020

The following code works as expected for single dimension arrays. It returns a view into the complex scratch array where I'm indexing the underlying Float32 values. I can operate on that view like any other CuArray type.

N = 31
Nc = div(N, 2) + 1
scratch = cu(Vector{ComplexF32}(undef, Nc))
myview = view(reinterpret(Float32, scratch), 1:N)
typeof(myview)

CuArray{Float32,1}

However, when I extend this idea to an N-D array, there is a bug where the returned type is the CPU object SubArray. This causes subsequent operations to be extremely slow or to produce an error.

N = (31, 7)
Nc = (div(N[1], 2) + 1, N[2])
scratch = cu(Array{ComplexF32,2}(undef, Nc))
myview = view(reinterpret(Float32, scratch, (2*Nc[1], N[2])), 1:N[1], 1:N[2])
typeof(myview)

SubArray{Float32,2,CuArray{Float32,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}

Details on Julia:

julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, ivybridge)
Environment:
  JULIA_NUM_THREADS = 8

Details on CUDA:

julia> CUDA.versioninfo()
CUDA toolkit 10.2.89, artifact installation
CUDA driver 10.2.0
NVIDIA driver 440.33.1

Libraries: 
- CUBLAS: 10.2.2
- CURAND: 10.1.2
- CUFFT: 10.1.2
- CUSOLVER: 10.3.0
- CUSPARSE: 10.3.1
- CUPTI: 12.0.0
- NVML: 10.0.0+440.33.1
- CUDNN: 8.0.2 (for CUDA 10.2.0)
- CUTENSOR: 1.2.0 (for CUDA 10.2.0)

Toolchain:
- Julia: 1.5.1
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
@btmit btmit added the bug Something isn't working label Sep 16, 2020
@btmit btmit changed the title View/Reinterpret with N-D Arrays View with N-D Arrays Sep 17, 2020
@btmit
Copy link
Author

btmit commented Sep 17, 2020

Looking into this a bit more, it appears the problem is probably with view() and has nothing to do with reinterpret().

@maleadt
Copy link
Member

maleadt commented Sep 17, 2020

the returned type is the CPU object SubArray

SubArray isn't a CPU object; as you can see this is a SubArray{<:CuArray}, so resides on the GPU. This is not a bug, but happens when you take a non-contiguous view. Those are expected to be slower: no dispatch to CUBLAS, no memory coalescing in kernels, etc.

@maleadt maleadt closed this as completed Sep 17, 2020
@maleadt maleadt removed the bug Something isn't working label Sep 17, 2020
@btmit
Copy link
Author

btmit commented Sep 17, 2020

You're right, of course. Here was my source of confusion.

julia> plan_rfft(myview)
ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float32,2}

Furthermore,

julia> typeof(fft(myview))
Array{Complex{Float32},2}

@maleadt
Copy link
Member

maleadt commented Sep 17, 2020

Yeah, that happens because we implement fft using CUFFT, which doesn't support (or we don't have it wrapped to support) non-contiguous views like yours here. Maybe it could be done through more advanced use of the CUFFT APIs (i.e., passing strides where necessary), and adapting dispatch to accept strided vectors like your SubArray here. Alternatively, collect or adapt your view to get a contiguous CuArray object.

@btmit
Copy link
Author

btmit commented Sep 17, 2020

Thanks. I can't afford the copy operation right now, which was the original motivation. Supporting strided ffts would be enabling for a number of applications. Is this related to #119? That one is really killing us.

Maybe you, @vchuravy, and I can discuss what this would involve directly.

@maleadt maleadt changed the title View with N-D Arrays FFT of strided array Sep 17, 2020
@maleadt maleadt reopened this Sep 17, 2020
@maleadt maleadt added enhancement New feature or request cuda array Stuff about CuArray. labels Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants