-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added speed improvements and CuArray support to CircShiftedArray #66
base: master
Are you sure you want to change the base?
Conversation
added now also broadcasting support for the standard |
.. will now try to fix the base-type problem and then also move the shift back to a variable inside the class. |
This is done now. The benchmarking yielded only minor changes which are acceptable and at least the combinatorial explosion problem is reduced. Most importantly the type-stability seems restored now. |
Yes, this seems like a cleaner solution if one wants to share functionality. The current implementation seems a bit odd with the value of the default element encoded in a type parameter. Having both AbstractShiftedArray{T, N} <: AbstractArray{T, N} seems a cleaner way to reuse functionality. I would also like to ideally keep the meaning of
To be honest, I think I would prefer to keep this package very lightweight, so unless there are strong advantages, I would avoiding using combination of other wrapper types. On a more general note, the PR seems pretty large (and so also a bit difficult to review), is there a way that it could be split into smaller / more manageable PRs? Though maybe switching from the more complex type to the inheritance approach to share functionality would reduce the diff significantly. |
I doubt it. The size of this PR is due to the details of the broadcasting mechanism, which one would probably need just as well (if not more) if an abstract array was introduced. |
Of course it can be helpful if @roflmaostc can review the code, but I think it would greatly simplify things if we tried to break this PR into more manageable chunks, as the diff really is quite large (esp. given how small the package is). As far as I see it, a few things are happening.
I would propose to first figure out 1. I think we should make the smallest possible change to the type hierarchy, as this package is a dependency of the JuliaStats stack. |
I guess this could have been done either way. I guess both have a "shift" parameter.
If it is part of the type, it can be determined at compile time rather than runtime. This is why I implemented it this way.
I agree. this could have been done separately.
Yes. This is the main and also the most important point. Without the broadcasting it cannot sensibly be used in |
Can we make some progress on this pull request? |
Can you be more specific? I'm not familiar with this package; What used to be CuArray-incompatible, and what changes did you have to make that are specific to CUDA.jl? |
@maleadt Sorry for pulling you into this here. In short: This module wraps an array into a shifted or circularly shifted array. Specialities are the support of types such as For this reason I implemented a bunch of necessary broadcast overloads for this module such that direct calls to @piever luckily implemented an quite extensive set of tests, which helped enormously in implementing the broadcasting schemes, which now fully pass these tests (and some more) both with and without A (minor?) detail is that the class structure was slightly changed (avoiding two independent classes) to avoid code-dublication, but this should still be compatible with the original classes for all practical purposes. |
I guess you're talking about scalar indexing here? Without looking too closely at the source code (where you seem to materialize the array wrapper eagerly before broadcasting); I wonder if it wouldn't have been possible to have |
Thanks for this interesting idea. This may also be an interesting approach. Of course you would than not get the speed-improvement that the full broadcasting support in the current implementation supplies for non-CUDA usage. |
If you pass a |
This sounds quite an interesting strategy, but I have to admit that I cannot fully follow as I am not acquainted well enough with the internals of |
|
Ok. This is actually an impressive scheme, and basically works. However, as I feared, there is a problem with the "non-concrete element type": using ShiftedArrays
using CUDA
using Adapt
CUDA.allowscalar(false);
a = rand(100,100);
# csa = circshift(a, (10,11))
csa = CircShiftedArray(a, (10,11));
ca = CuArray(a);
csca = CircShiftedArray(ca, (10,11));
Adapt.adapt_structure(to, x::CircShiftedArray{T, D, CT}) where {T,D,CT<:CuArray} = CircShiftedArray(adapt(to, parent(x)), shifts(csca));
q = adapt(CuArray,csca);
q = cudaconvert(csca);
typeof(q)
qc = csca .+ 1.0;
function Base.Broadcast.BroadcastStyle(::Type{T}) where (T<: CircShiftedArray{<:Any,<:Any,<:CuArray})
CUDA.CuArrayStyle{ndims(T)}()
end
qc = csca .+ 1.0;
q = csa .+ 1.0;
Array(qc) == q # check for identical result
# Great!
sa = ShiftedArray(a,(22,23))
sca = ShiftedArray(ca,(22,23));
# lets do this for the ShiftedArray type
function Base.Broadcast.BroadcastStyle(::Type{T}) where (T<: ShiftedArray{<:Any,<:Any,<:Any,<:CuArray})
CUDA.CuArrayStyle{ndims(T)}()
end
q = sa .+ 1.0;ERROR: GPU broadcast resulted in non-concrete element type Union{Missing, Float64}.
# This probably means that
# Here is the problem:
qc = sca .+ 1.0;
# ERROR: GPU broadcast resulted in non-concrete element type Union{Missing, Float64}.
# This probably means that the function you are broadcasting contains an error or type instability. |
Eh, that probably just needs to be relaxed. As I mentioned above, CuArrays supports union-isbits storage nowadays. Removing that error as a test, and adding the necessary Adapt rule, makes everything work: julia> Adapt.adapt_structure(to, x::ShiftedArray{T, M, N, <:CuArray}) where {T,M,N} =
ShiftedArray(adapt(to, parent(x)), shifts(x); default=ShiftedArrays.default(x))
julia> typeof(cudaconvert(sca))
ShiftedArray{Float64, Missing, 2, CuDeviceMatrix{Float64, 1}}
julia> sca .+ 1.0
100×100 CuArray{Union{Missing, Float64}, 2, CUDA.Mem.DeviceBuffer}:
missing missing missing missing missing missing missing missing missing missing … missing missing missing missing missing missing missing missing
... Also:
That |
Thanks a lot. Silly me, to forget the second adapt rule. Thanks also for spotting the wrong |
So, here comes the surprise. I did some benchmarking using The question is therefore how to proceed from here? @piever |
@piever see above. Any comments how to proceed? |
@RainerHeintzmann sorry for the very slow reply rate in this PR! I can see that a lot of work already went into this, and it'd be definitely very useful to get ShiftedArrays of CuArrays to work well.
Maybe a trivial comment, but did you make sure to also use
Having the adapt rule in the package seems perfectly reasonable. It's just a lightweight dependency + 2 lines of code. I would agree to use the new weakdeps machinery (just how OffsetArrays.jl plans on doing it: JuliaArrays/OffsetArrays.jl#331). Given that the |
Thanks for your hints. Indeed I did not consider the |
I think the first thing to do is to turn the snippet from #66 (comment) (and the analogous for I imagine that if you need the extra performance of this PR, you could start developing a FastShiftedArraysBroadcast.jl package (essentially a copy-paste of this PR) that "pirates" the broadcasting methods, and start battle-testing it in your projects and polishing it. Overtime we can figure out whether the code complexity / performance trade-off is worth it. |
OK. I did already write that code and I am happy to do a new PR, but only in about two weeks, if that's OK.
|
Sounds great! |
this is achieved by implementing broadcasting to CircShiftedArray. Note that all tests run except of the
@inferred
macro.Also the
show()
was overloaded in such a way, that the CuArray displays without an error or warning but the package does not need to depend onCUDA.jl
.Where possible the base "circshift" is used upon materialization.