-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further allocation reduction in power computations #361
Conversation
…_by_squaring!; add in-place sqr! method
Thanks @PerezHz for addressing this! Regarding the question on adding |
# in-place product: `a` <- `a*b` | ||
# this method computes the product `a*b` and saves it back into `a` | ||
# assumes `a` and `b` are of same order | ||
function mul!(a::TaylorN{T}, b::TaylorN{T}) where {T<:Number} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An equivalent method mul!(a::Taylor1{T}, b::Taylor1{T}) where {T<:Number})
could be useful for the extension of pow!
for Taylor1{TaylorN{T}}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would indeed by useful, I've added the corresponding method, thanks!
src/arithmetic.jl
Outdated
function mul!(a::TaylorN{T}, b::TaylorN{T}) where {T<:Number} | ||
for k in reverse(eachindex(a)) | ||
mul!(a, a, b[0][1], k) | ||
for l in 1:k | ||
mul!(a[k], a[k-l], b[l]) | ||
end | ||
end | ||
return nothing | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a good proposal, but here it goes: the code for the next method of mul!
could be simplified using the new one proposed in this PR, e.g., by copying a
into res
, and then using mul!(res, b)
.
src/power.jl
Outdated
@@ -58,37 +58,95 @@ end | |||
|
|||
^(a::Taylor1{TaylorN{T}}, r::Rational) where {T<:NumberNotSeries} = a^(r.num/r.den) | |||
|
|||
# in-place form of power_by_squaring | |||
# this method assumes `y`, `x` and `aux1` are of same order | |||
function power_by_squaring!(y::TaylorN{T}, x::TaylorN{T}, aux1::TaylorN{T}, p::Integer) where {T<:NumberNotSeries} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could an equivalent method be added for Taylor1
inputs? Or are they not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful, yes, I've added it, thanks!
src/power.jl
Outdated
@eval power_by_squaring(x::$T, p::Integer) = power_by_squaring(x, Val(p)) | ||
@eval power_by_squaring(x::$T, ::Val{0}) = one(x) | ||
@eval power_by_squaring(x::$T, ::Val{1}) = copy(x) | ||
@eval power_by_squaring(x::$T, ::Val{2}) = square(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps adding power_by_squaring(x::$T, ::Val{3})
could be useful too, since it includes the "odd" powers part of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean adding something along the lines of power_by_squaring(x::$T, ::Val{3}) = x*square(x)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point... I guess this is the kind of question that we should define in terms of performance. My naive guess is that x*square(x)
will allocate more than the direct call to power_by_squaring
, but we should definetily check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked this, and it turns out doing x*square(x)
allocates less than power_by_squaring(x)
(!), do you think it's worth then to add the ::Val{3}
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you also teste with intervals? I would choose the case that behaves better wrt allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With intervals power_by_squaring(x, 3)
on Taylor1{Interval{Float64}}
allocates around half as x*square(x)
, while for TaylorN{Interval{Float64}}
allocations are essentially the same
I'v left few comments, which are simple suggestions |
Alright, sounds good to me! I'll carry on forward then and report back after comparing performance wrt master. |
Many thanks for your suggestions! I've added some of the methods you suggested (the modification to |
I'll have a look on what's going on there, and comment on that (later). |
I'm not sure where the problem is, yet, but it lies in julia> t = Taylor1(10)
1.0 t + 𝒪(t¹¹)
julia> @which Base.power_by_squaring(t-1,4)
power_by_squaring(x::Taylor1, p::Integer)
@ TaylorSeries ~/.julia/dev/TaylorSeries/src/power.jl:102
julia> Base.power_by_squaring(t-1,4) # WRONG!
1.0 - 4.0 t + 4.0 t² - 4.0 t³ + 1.0 t⁴ + 𝒪(t¹¹) The last result is wrong(the second-order term which should be 6). You get similar problems using higher values of the power. |
Tests are broken since 2e4e60a, so we should begin there. |
src/power.jl
Outdated
c[k] += c[i] * c[k-i] | ||
end | ||
@inbounds c[k] = 2 * c[k] | ||
(kodd == 0) && ( @inbounds c[k] = c[k >> 1]^2 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following solves the problem...
(kodd == 0) && ( @inbounds c[k] = c[k >> 1]^2 ) | |
(kodd == 0) && ( @inbounds c[k] += c[k >> 1]^2 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! It does the trick, many thanks!
Many thanks for the fix @lbenet! I've updated here the call signature for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PerezHz again for addressing this! I've left few comments which I think are useful. Also, while I was reviewing this PR, I noticed you pushed some new commits. I therefore send this now, and will review back again later.
src/auxiliary.jl
Outdated
@simd for ord in lencoef+1:order+1 | ||
@inbounds coeffs[ord] = z | ||
@inbounds coeffs[ord] = zero(coeffs[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this change helps avoiding the allocation of z
. Yet, I think the idea is to use z
(stored in the stack) instead of looking for the value of coeffs[1]
and zeroing it inside the for
, helps wrt performance. I may be wrong, just want to make sure the idea behind the code is clear.
src/auxiliary.jl
Outdated
@simd for ord in lencoef+1:num_coeffs | ||
@inbounds coeffs[ord] = z | ||
@inbounds coeffs[ord] = zero(coeffs[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
src/auxiliary.jl
Outdated
@@ -84,7 +82,7 @@ getindex(a::Taylor1{T}, u::StepRange{Int,Int}) where {T<:Number} = | |||
view(a.coeffs, u[:] .+ 1) | |||
|
|||
setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:Number} = a.coeffs[n+1] = x | |||
setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:AbstractSeries} = setindex!(a.coeffs, deepcopy(x), n+1) | |||
setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:AbstractSeries} = setindex!(a.coeffs, x, n+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the deepcopy
is not necessary here. It was included to ensure independence of x
and a.coeffs[n+1]
(both are <:AbstractSeries
). So, I simply suggest to add a test which ensures this. (Probably there is already one, in which case, there is no need to do anything.)
@@ -156,7 +156,7 @@ struct TaylorN{T<:Number} <: AbstractSeries{T} | |||
order :: Int | |||
|
|||
function TaylorN{T}(v::Array{HomogeneousPolynomial{T},1}, order::Int) where T<:Number | |||
coeffs = zeros(HomogeneousPolynomial{T}, order) | |||
coeffs = isempty(v) ? zeros(HomogeneousPolynomial{T}, order) : zeros(v[1], order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we would need similar fixes in Taylor1
and HomogeneousPolynomial
...
src/power.jl
Outdated
if $T != HomogeneousPolynomial | ||
function power_by_squaring(x::$T, ::Val{P}) where P | ||
p = P # copy static parameter `P` into local variable `p` | ||
y = zero(x) | ||
aux = zero(x) | ||
power_by_squaring!(y, x, aux, p) | ||
return y | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this if
is overwritten for HomogeneousPolynomial
by the method defined in line 118...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally, I think this code in nicer than the one in line 118...
src/power.jl
Outdated
power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{0}) where {T<:NumberNotSeries} = one(x) | ||
power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{1}) where {T<:NumberNotSeries} = copy(x) | ||
power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{2}) where {T<:NumberNotSeries} = square(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these methods are defined above in lines 104-106...
src/power.jl
Outdated
@inline pow!(c::Taylor1{T}, a::Taylor1{T}, b::Taylor1{T}, r::S, k::Int) where | ||
{T<:Number, S <: Real} = pow!(c, a, b, Val(r), k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since r
is not necessarily an Int
, I fear that this method could overpopulate methods to dispatch, making slower the dispatch stage.
If we add a concrete method for Int
and for other numeric types (whose value is not an integer and will not use Val
), I think this would be nice.
Tests are passing, which is always nice!
Regarding the patch version, I think in this case (including the future PR you mention on |
That is actually a good point. I agree that if we bump a patch release for this PR this can break code relying on |
Just pushed a commit switching back from the dispatches-by-value to if/else in |
I think this PR merits a new minor version release; once said this, we may merge it to master without tagging a new release, in order to include the ideas you have on |
Okay, let's merge this on master and then merge the corresponding PR on TaylorIntegration without tagging, and tag after we're sure everything is looking fine. The |
I reverted the change in the |
Fine with me. Naive question: have all suggestions of the review been addressed? Shall I make another (final) review? |
As far as I've looked I think I managed to address everything, but perhaps another quick round of review can help just in case I missed something. One thing to note is that I reverted the patch version bumping, so this PR does not change version (it will still be v0.17.8). Otherwise, from my perspective this is ready for merging. |
I'll have a look today; regarding the version, v0.17.8 will be the tag of the proper branch, and this PR will be available from master. In other words, dev version and the last tagged version will be different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot. I'll go ahead and merge
Mrged! |
@lbenet sorry I think I pushed a commit on my fork just before you merged which I think better reflects where the mutating |
... and thanks for mergning by the way! |
Sorry... I was simply too fast... (I thought over my suggestions and they may not be really crutial at this stage.) So a new PR is ok |
On the contrary thank you for looking into all this! Just opened a new PR (see #363) |
* Updates due to JuliaDiff/TaylorSeries.jl#361 * Remove comments and small fix * Update CI (use TaylorSeries branch from fork); use `diffeq!` in non-parsed jetcoeffs! methods * Update CI * Use diffeq! in preamble * Retain only `local` declarations in preamble * Trigger CI * Update CI * Update Project.toml * Update CI * Rename diffeq! -> ode! * Update ci.yml * Trigger CI * Update README.md * Rename ode! -> solcoeff! * Bump patch version
This PR continues in the spirit of #347 and #349, introducing an in-place form of power-by-squaring. It manages to avoid almost all allocations, except for one that I haven't managed to take care of so far, here. One way to get rid of it would be to simply add an argument to in-place methods of power, i.e,
pow!
, such that instead of calling such methods aspow!(res, a, p, k)
, they would now be called aspow!(res, a, p, aux, k)
. Or maybe there is another way to get rid of this auxiliary variable? Essentially the problem is thatpower_by_squaring
uses some intermediate variables which don't allocate for non-composite numeric types (structs <: Number), but do allocate otherwise. What is your opinion on this @lbenet?