-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid some allocations in mutating methods #347
Conversation
I think it is worth pursuing indeed, since the results are really good! Once said that, let me point to this line, which defines |
We should indeed keep the methods, e.g. for initialization of the aux quantities created by |
AFAIU, that line eventually calls |
( I guess a better place for those functions would be in |
I'm trying to also avoid allocations in TaylorSeries.jl/src/arithmetic.jl Line 763 in f5587a7
I'm thinking that a possible workaround is to add it as an argument to div! , so that that auxiliary can be declared somewhere else, but that change would be perhaps a bit breaking? There are other mutating functions with similar auxiliaries defined in the function body, which could benefit from a similar treatment, what are your thoughts on this @lbenet?
|
In other words, whereas this PR makes julia> @btime TaylorSeries.mul!($z1N,$x1N,$y1N,0)
55.880 ns (0 allocations: 0 bytes)
julia> @btime TaylorSeries.div!($z1N,$x1N,$y1N,0)
360.201 ns (15 allocations: 1.09 KiB) |
I don't quite remember the actual reason why that
I don't think we should care about how breaking that change would be to the code; we simply release a new patch version 😄. Once said this, my guess is that such a change impacts in the infrastructure needed for |
@PerezHz I've checked that |
Even with the dirty tricks the specialized methods for mixtures allocate a bit less than the generic ones, so it's definitely a step forward I would say!
Agreed!
A bit yes, but not so much, as far as I've managed to test! (more details on this below...)
So far the mutating methods have allowed very good speedups while using
That's a good point, and finding a good solution can take more than one attempt, yet I think it's worth pursuing. In a nutshell I guess what I'm trying to say is, we should aim for making Now, I found an issue with julia> using TaylorSeries
julia> dq = set_variables("δq", order=1, numvars=2)
2-element Vector{TaylorN{Float64}}:
1.0 δq₁ + 𝒪(‖x‖²)
1.0 δq₂ + 𝒪(‖x‖²)
julia> x = Taylor1([1.0 - 1.8dq[1],-0.007 + 0.0194dq[1], -1e-5 -0.00027dq[2], 3e-7 - 1.8e-5dq[1]-dq[2],dq[1]])
1.0 - 1.8 δq₁ + 𝒪(‖x‖²) + ( - 0.007 + 0.0194 δq₁ + 𝒪(‖x‖²)) t + ( - 1.0e-5 - 0.00027 δq₂ + 𝒪(‖x‖²)) t² + ( 3.0e-7 - 1.8e-5 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)
julia> y = Taylor1([-2.0 + dq[1],-1dq[2],2dq[2], dq[1]+dq[2],dq[2]])
- 2.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₂ + 𝒪(‖x‖²)) t + ( 2.0 δq₂ + 𝒪(‖x‖²)) t² + ( 1.0 δq₁ + 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₂ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)
julia> z = deepcopy(x)
1.0 - 1.8 δq₁ + 𝒪(‖x‖²) + ( - 0.007 + 0.0194 δq₁ + 𝒪(‖x‖²)) t + ( - 1.0e-5 - 0.00027 δq₂ + 𝒪(‖x‖²)) t² + ( 3.0e-7 - 1.8e-5 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)
julia> z.coeffs[1] = x.coeffs[1]^1.5 # this line and the one below is similar to how variables are initialized in parsed jetcoeffs! methods in TaylorIntegration
1.0 - 2.7 δq₁ + 𝒪(‖x‖²)
julia> z.coeffs[2:end] .= zero(z.coeffs[1]) # similar to what happens in parsed jetcoeffs! methods
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
julia> TaylorSeries.pow!(z, x, 1.5, 1) # compute 1st-order coefficients
julia> z # the 1st-order coefficients are copied into higher-order coefficients!!
1.0 - 2.7 δq₁ + 𝒪(‖x‖²) + ( - 0.0105 + 0.03855000000000001 δq₁ + 𝒪(‖x‖²)) t + ( - 0.0105 + 0.03855000000000001 δq₁ + 𝒪(‖x‖²)) t² + ( - 0.0105 + 0.03855000000000001 δq₁ + 𝒪(‖x‖²)) t³ + ( - 0.0105 + 0.03855000000000001 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵) any hints about what might be happening here? |
A workaround is to simply broadcast z.coeffs[2:end] .= zero.(z.coeffs[1]) # <-- broadcasted zero |
I can't reproduce the problem you report here... I'm using Julia v1.10, and TaylorSeries from master... |
Sorry for the confusion, the issue happens with this PR's branch |
... now, if you meant that the issue pops up only using this PR then yes, although I actually would say the issue is hidden in TaylorIntegration. I'll explain in the next comment. |
Using current TS master: julia> using TaylorSeries
julia> dq = set_variables("δq", order=1, numvars=2)
2-element Vector{TaylorN{Float64}}:
1.0 δq₁ + 𝒪(‖x‖²)
1.0 δq₂ + 𝒪(‖x‖²)
julia> x = Taylor1([1.0+dq[1],-dq[1], dq[2], dq[1]-dq[2],dq[1]])
1.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₁ + 𝒪(‖x‖²)) t + ( 1.0 δq₂ + 𝒪(‖x‖²)) t² + ( 1.0 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)
julia> z = deepcopy(x)
1.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₁ + 𝒪(‖x‖²)) t + ( 1.0 δq₂ + 𝒪(‖x‖²)) t² + ( 1.0 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)
julia> z.coeffs[1] = x.coeffs[1]^1.5
1.0 + 1.5 δq₁ + 𝒪(‖x‖²)
julia> z.coeffs[2:end] .= zero(z.coeffs[1]) # no zero broadcasting here
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
julia> z[1][0] = HomogeneousPolynomial(-4.0,0) # only one z coefficient is updated
- 4.0
julia> z # but the change is copied everywhere!
1.0 + 1.5 δq₁ + 𝒪(‖x‖²) + ( - 4.0 + 𝒪(‖x‖²)) t + ( - 4.0 + 𝒪(‖x‖²)) t² + ( - 4.0 + 𝒪(‖x‖²)) t³ + ( - 4.0 + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵) but this can be easily solved if we broadcast zero while initially filling the coefficients of z: julia> z.coeffs[2:end] .= zero.(z.coeffs[1]) # zero is broadcasted here
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
julia> z[1][0] = HomogeneousPolynomial(-4.0,0)
- 4.0
julia> z
1.0 + 1.5 δq₁ + 𝒪(‖x‖²) + ( - 4.0 + 𝒪(‖x‖²)) t + 𝒪(t⁵) AFAIU, the issue is that if julia> z.coeffs[2:end] .= zero(z.coeffs[1]) # no zero broadcast
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
julia> z.coeffs[2] == z.coeffs[3]
true
julia> z.coeffs[2] === z.coeffs[3] # check if memory address is the same
true
julia> z.coeffs[2:end] .= zero.(z.coeffs[1]) # now, broadcast zero
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
0.0 + 𝒪(‖x‖²)
julia> z.coeffs[2] == z.coeffs[3]
true
julia> z.coeffs[2] === z.coeffs[3] # now each coefficient points to a different place in memory
false |
Now, this raises the question: why does this appear in this PR and not in current Line 279 in f5587a7
the way it currently behaves it allocates a new place in memory of each of the elements of res.coeffs , but these lines are precisely the ones that this PR touches, updating coefficients inplace while avoiding allocations. Of course, as you mentioned earlier, there are other allocations that are more tricky to remove, but we can go step by step.
If downstream tests pass (TI, PE, NEOs) then I would propose to go ahead with this and make the update in TaylorIntegration, unless there are other considerations to be made? |
I think that |
Just as a check of what I say, consider (in your example): julia> z = deepcopy(x)
1.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₁ + 𝒪(‖x‖²)) t + ( 1.0 δq₂ + 𝒪(‖x‖²)) t² + ( 1.0 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)
julia> z.coeffs[3:end] .= zero(z.coeffs[1]); # changing from the 2nd order onwards
julia> z[3][0] = HomogeneousPolynomial(6.0,0) # changing order 3!
6.0
julia> z # order 2 is also changed, but not order 1
1.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₁ + 𝒪(‖x‖²)) t + ( 6.0 + 𝒪(‖x‖²)) t² + ( 6.0 + 𝒪(‖x‖²)) t³ + ( 6.0 + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵) So indeed that seems to change all coefficients from order 2. I think this is because |
That's a good point, it does allocate, and even more: I would expect this since in the zero-broadcasting case it is allocating different places in memory for each z coefficient: julia> using BenchmarkTools
julia> function zeronobc(z)
z.coeffs[2:end] .= zero(z.coeffs[1])
return nothing
end
zeronobc (generic function with 1 method)
julia> function zerobc(z)
z.coeffs[2:end] .= zero.(z.coeffs[1])
return nothing
end
zerobc (generic function with 1 method)
julia> @btime zeronobc($z)
219.727 ns (10 allocations: 752 bytes)
julia> @btime zerobc($z)
880.123 ns (40 allocations: 2.94 KiB) |
src/functions.jl
Outdated
@@ -655,6 +655,75 @@ end | |||
|
|||
|
|||
# Mutating functions for Taylor1{TaylorN{T}} | |||
@inline function zero!(a::Taylor1{T}, k::Int) where {T<:NumberNotSeries} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for this late suggestion: what about redefining zero!(c, a, k)
simply to point to zero!(c, k)
? That would avoid lots of changes (which can be kept here), but in particular those related to @taylorize
(in dictmutfunct.jl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I changed the definition of zero!(c, a, k)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more suggestion on this: can you move the code (of the zero!
methods introduced in this PR), closer to other zero!
methods?
src/arithmetic.jl
Outdated
function /(a::NumberNotSeries, b::Taylor1{TaylorN{T}}) where {T<:NumberNotSeries} | ||
iszero(a) && !iszero(b) && return zero(a) | ||
|
||
# order and coefficient of first factorized term | ||
# In this case, since a[k]=0 for k>0, we can simplify to: | ||
# ordfact, cdivfact = 0, a/b[0] | ||
|
||
res = Taylor1(a/b[0], b.order) | ||
for ordT in eachindex(res) | ||
div!(res, a, b, ordT) | ||
end | ||
return res | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose the following here to ensure type stability (this is untested)
function /(a::NumberNotSeries, b::Taylor1{TaylorN{T}}) where {T<:NumberNotSeries}
R = promote_type(typeof(a), TaylorN{T})
iszero(a) && !iszero(b) && return convert(Taylor1{R}, zero(b))
# order and coefficient of first factorized term
# In this case, since a[k]=0 for k>0, we can simplify to:
# ordfact, cdivfact = 0, a/b[0]
res = Taylor1(a/b[0], b.order)
for ordT in eachindex(res)
div!(res, a, b, ordT)
end
return res
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a::NumberNotSeries
is perhaps too strong, because it leaves outside a::TaylorN{T}
, which is a method worth having too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I added the fix for type-stability and widened the type for a
to allow for TaylorN{T}
src/arithmetic.jl
Outdated
@inline function div!(c::Taylor1, a::NumberNotSeries, | ||
b::Taylor1, k::Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should impose c::Taylor1{T}
as well as b::Taylor1{T}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, although I think we can make an extra tweak to avoid some allocations for T<:TaylorN{<:NumberNotSeries}
, so I added an extra method for the latter case (will fix it in the next few commits)
Tests in TI.jl are passing now. I'll try to complete my review in the next few days. Thanks a lot! |
Sure! There's still a couple of things I'd like to try here to save some further allocs, but indeed this is essentially ready for review. |
I think there's indeed many things that can still be done in order to avoid allocations in mutating functions, but the changes introduced here I think are in a sense self-contained, so I'd rather leave further changes to another PR. Therefore, as it is standing now, this PR is ready for review. |
(maybe I'm repeating myself but just wanted to be clear that modulo reviews and some fixes, I don't intend a priori to add more things to this PR) |
I've been running some benchmarks and using this PR for JT integrations in NEOs.jl we get 30% less memory used and about 20% better (shorter) runtime in long (>100 timesteps) integrations. Short integrations, where "time to first time-step" is dominant, will not benefit that much from this PR. |
I'll have a look later; the results you just quoted are very encouraging, so we should think about merging and tagging a new version! |
…mas in IntervalArithmetic compat entry
Alright! Just bumped the minor version, let me know if there is anything else you would like to improve around here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @PerezHz, this is a nice improvement! I have left one single comment (with code), which is a suggestion that may improve performance. In any case, I'm approving this PR and merging it, so we can exploit it in TaylorIntegration!
Thanks again a lot!
|
||
@inbounds mul!(c[k], c[0], b[k]) | ||
@inbounds for i = 1:k-1 | ||
c[k] += c[i] * b[k-i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is related to the TODO above. We do have methods that indeed accumulate as mentioned above for TaylorN. I wonder if replacing this line with those methods can provide further improvements. The suggested code is:
# c[k] += c[i] * b[k-i] # current code
for ord in eachindex(c[k])
mul!(c[k], c[i], b[k-i], ord)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, I'll give it a try and see if we can get rid of some more allocations 😄 !
Thanks for merging! I'll go ahead now and update things accordingly in TaylorIntegration |
In current
master
, some mutating methods allocate, especially forTaylorN
and mixtures:This PR is a proof of concept which avoids allocations in key places, such as
zero
andmul!
(similar things can be tried indiv!
, etc.), just by defining suitable inplacezero!
methods:Using these
zero!
methods then we can do things like:without a single allocation.
This is still very much WIP, but do you think this is an idea worth pursuing @lbenet?