-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerating MultiLayerQG on GPUs #373
base: main
Are you sure you want to change the base?
Changes from 10 commits
4f7d797
114977c
76c7728
e8cd9d7
633797b
9166539
04298c4
b3161d8
880f4f6
63f6d9b
a04c5b0
5bc7c5f
102000e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -18,11 +18,13 @@ using | |||||
LinearAlgebra, | ||||||
StaticArrays, | ||||||
Reexport, | ||||||
DocStringExtensions | ||||||
DocStringExtensions, | ||||||
KernelAbstractions | ||||||
|
||||||
@reexport using FourierFlows | ||||||
|
||||||
using FourierFlows: parsevalsum, parsevalsum2, superzeros, plan_flows_rfft | ||||||
using FourierFlows: parsevalsum, parsevalsum2, superzeros, plan_flows_rfft, CPU, GPU | ||||||
using KernelAbstractions.Extras.LoopInfo: @unroll | ||||||
|
||||||
nothingfunction(args...) = nothing | ||||||
|
||||||
|
@@ -111,16 +113,16 @@ function Problem(nlayers::Int, # number of fluid lay | |||||
aliased_fraction = 1/3, | ||||||
T = Float64) | ||||||
|
||||||
if dev == GPU() && nlayers > 2 | ||||||
@warn """MultiLayerQG module is not optimized on the GPU yet for configurations with | ||||||
3 fluid layers or more! | ||||||
# if dev == GPU() && nlayers > 2 | ||||||
# @warn """MultiLayerQG module is not optimized on the GPU yet for configurations with | ||||||
# 3 fluid layers or more! | ||||||
|
||||||
See issues on Github at https://github.com/FourierFlows/GeophysicalFlows.jl/issues/112 | ||||||
and https://github.com/FourierFlows/GeophysicalFlows.jl/issues/267. | ||||||
# See issues on Github at https://github.com/FourierFlows/GeophysicalFlows.jl/issues/112 | ||||||
# and https://github.com/FourierFlows/GeophysicalFlows.jl/issues/267. | ||||||
|
||||||
To use MultiLayerQG with 3 fluid layers or more we suggest, for now, to restrict running | ||||||
on CPU.""" | ||||||
end | ||||||
# To use MultiLayerQG with 3 fluid layers or more we suggest, for now, to restrict running | ||||||
# on CPU.""" | ||||||
# end | ||||||
|
||||||
if nlayers == 1 | ||||||
@warn """MultiLayerQG module does work for single-layer configuration but may not be as | ||||||
|
@@ -533,16 +535,50 @@ Compute the inverse Fourier transform of `varh` and store it in `var`. | |||||
""" | ||||||
invtransform!(var, varh, params::AbstractParams) = ldiv!(var, params.rfftplan, varh) | ||||||
|
||||||
""" | ||||||
@kernel function pvfromstreamfunction_kernel!(qh, ψh, S, nlayers) | ||||||
|
||||||
Kernel for GPU acceleration of PV from streamfunction calculation, i.e., obtaining the Fourier | ||||||
transform of the PV from the streamfunction `ψh` in each layer using `qh = params.S * ψh`. | ||||||
""" | ||||||
@kernel function pvfromstreamfunction_kernel!(qh, ψh, S, nlayers) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
for |
||||||
i, j = @index(Global, NTuple) | ||||||
|
||||||
@unroll for k = 1:nlayers | ||||||
|
||||||
@inbounds qh[i, j, k] = 0 | ||||||
|
||||||
@unroll for m = 1:nlayers | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||||||
@inbounds qh[i, j, k] += S[i, j][k, m] * ψh[i, j, m] | ||||||
end | ||||||
|
||||||
end | ||||||
end | ||||||
|
||||||
""" | ||||||
pvfromstreamfunction!(qh, ψh, params, grid) | ||||||
|
||||||
Obtain the Fourier transform of the PV from the streamfunction `ψh` in each layer using | ||||||
`qh = params.S * ψh`. | ||||||
`qh = params.S * ψh`. We use a work layout over which the above-defined kernel is launched. | ||||||
""" | ||||||
function pvfromstreamfunction!(qh, ψh, params, grid) | ||||||
for j=1:grid.nl, i=1:grid.nkr | ||||||
CUDA.@allowscalar @views qh[i, j, :] .= params.S[i, j] * ψh[i, j, :] | ||||||
end | ||||||
# Larger workgroups are generally more efficient. For more generality, we could put an | ||||||
# if statement that incurs different behavior when either nkl or nl are less than 8 | ||||||
workgroup = 8, 8 | ||||||
|
||||||
# The worksize determines how many times the kernel is run | ||||||
worksize = grid.nkr, grid.nl | ||||||
|
||||||
# Instantiates the kernel for relevant backend device | ||||||
backend = KernelAbstractions.get_backend(qh) | ||||||
kernel! = pvfromstreamfunction_kernel!(backend, workgroup, worksize) | ||||||
|
||||||
# Launch the kernel | ||||||
S, nlayers = params.S, params.nlayers | ||||||
kernel!(qh, ψh, S, nlayers) | ||||||
|
||||||
# This will ensure that no other operations occur until the kernel has finished | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
KernelAbstractions.synchronize(backend) | ||||||
|
||||||
return nothing | ||||||
end | ||||||
|
@@ -587,16 +623,50 @@ function pvfromstreamfunction!(qh, ψh, params::TwoLayerParams, grid) | |||||
return nothing | ||||||
end | ||||||
|
||||||
""" | ||||||
@kernel function streamfunctionfrompv_kernel!(ψh, qh, S⁻¹, nlayers) | ||||||
|
||||||
Kernel for GPU acceleration of streamfunction from PV calculation, i.e., invert the PV to obtain | ||||||
the Fourier transform of the streamfunction `ψh` in each layer from `qh` using `ψh = params.S⁻¹ qh`. | ||||||
""" | ||||||
@kernel function streamfunctionfrompv_kernel!(ψh, qh, S⁻¹, nlayers) | ||||||
i, j = @index(Global, NTuple) | ||||||
|
||||||
@unroll for k = 1:nlayers | ||||||
|
||||||
@inbounds ψh[i, j, k] = 0 | ||||||
|
||||||
@unroll for m = 1:nlayers | ||||||
@inbounds ψh[i, j, k] += S⁻¹[i, j][k, m] * qh[i, j, m] | ||||||
end | ||||||
|
||||||
end | ||||||
end | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are the two kernel functions identical except the order they want their arguments? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like it yes. It'd probably make sense to write one kernel in the more general form. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think so! Would make the code more robust. |
||||||
""" | ||||||
streamfunctionfrompv!(ψh, qh, params, grid) | ||||||
|
||||||
Invert the PV to obtain the Fourier transform of the streamfunction `ψh` in each layer from | ||||||
`qh` using `ψh = params.S⁻¹ qh`. | ||||||
`qh` using `ψh = params.S⁻¹ qh`. We use a work layout over which the above-defined kernel is launched. | ||||||
""" | ||||||
function streamfunctionfrompv!(ψh, qh, params, grid) | ||||||
for j=1:grid.nl, i=1:grid.nkr | ||||||
CUDA.@allowscalar @views ψh[i, j, :] .= params.S⁻¹[i, j] * qh[i, j, :] | ||||||
end | ||||||
# Larger workgroups are generally more efficient. For more generality, we could put an | ||||||
# if statement that incurs different behavior when either nkl or nl are less than 8 | ||||||
workgroup = 8, 8 | ||||||
|
||||||
# The worksize determines how many times the kernel is run | ||||||
worksize = grid.nkr, grid.nl | ||||||
|
||||||
# Instantiates the kernel for relevant backend device | ||||||
backend = KernelAbstractions.get_backend(ψh) | ||||||
kernel! = streamfunctionfrompv_kernel!(backend, workgroup, worksize) | ||||||
|
||||||
# Launch the kernel | ||||||
S⁻¹, nlayers = params.S⁻¹, params.nlayers | ||||||
kernel!(ψh, qh, S⁻¹, nlayers) | ||||||
|
||||||
# This will ensure that no other operations occur until the kernel has finished | ||||||
KernelAbstractions.synchronize(backend) | ||||||
|
||||||
return nothing | ||||||
end | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why comment these out? Delete?