Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid some allocations in mutating methods #347

Merged
merged 23 commits into from
Feb 24, 2024

Conversation

PerezHz
Copy link
Contributor

@PerezHz PerezHz commented Jan 31, 2024

In current master, some mutating methods allocate, especially for TaylorN and mixtures:

julia> using TaylorSeries, BenchmarkTools

julia> dq = set_variables("dq", order=2, numvars=2)
2-element Vector{TaylorN{Float64}}:
  1.0 dq₁ + 𝒪(‖x‖³)
  1.0 dq₂ + 𝒪(‖x‖³)

julia> x1N = Taylor1([dq[1],dq[2],dq[1]*dq[2]])
  1.0 dq₁ + 𝒪(‖x‖³) + ( 1.0 dq₂ + 𝒪(‖x‖³)) t + ( 1.0 dq₁ dq₂ + 𝒪(‖x‖³)) t² + 𝒪(t³)

julia> y1N = atan(x1N)
  1.0 dq₁ + 𝒪(‖x‖³) + ( 1.0 dq₂ + 𝒪(‖x‖³)) t + ( 1.0 dq₁ dq₂ + 𝒪(‖x‖³)) t² + 𝒪(t³)

julia> z1N = zero(x1N)
  0.0 + 𝒪(‖x‖³) + 𝒪(t³)

julia> @btime TaylorSeries.mul!($z1N,$x1N,$y1N,0)
  590.044 ns (14 allocations: 1.06 KiB)

This PR is a proof of concept which avoids allocations in key places, such as zero and mul! (similar things can be tried in div!, etc.), just by defining suitable inplace zero! methods:

julia> using TaylorSeries, BenchmarkTools

julia> dq = set_variables("dq", order=2, numvars=2)
2-element Vector{TaylorN{Float64}}:
  1.0 dq₁ + 𝒪(‖x‖³)
  1.0 dq₂ + 𝒪(‖x‖³)

julia> x1N = Taylor1([dq[1],dq[2],dq[1]*dq[2]])
  1.0 dq₁ + 𝒪(‖x‖³) + ( 1.0 dq₂ + 𝒪(‖x‖³)) t + ( 1.0 dq₁ dq₂ + 𝒪(‖x‖³)) t² + 𝒪(t³)

julia> y1N = atan(x1N)
  1.0 dq₁ + 𝒪(‖x‖³) + ( 1.0 dq₂ + 𝒪(‖x‖³)) t + ( 1.0 dq₁ dq₂ + 𝒪(‖x‖³)) t² + 𝒪(t³)

julia> z1N = zero(x1N)
  0.0 + 𝒪(‖x‖³) + 𝒪(t³)

julia> @btime TaylorSeries.mul!($z1N,$x1N,$y1N,0)
  83.679 ns (0 allocations: 0 bytes)

Using these zero! methods then we can do things like:

julia> x1 = Taylor1(rand(3))
 0.6727385330826134 + 0.023575091119572567 t + 0.408205561614307+ 𝒪(t³)

julia> xN = sin(dq[1]+dq[2])
 1.0 dq₁ + 1.0 dq₂ + 𝒪(‖x‖³)

julia> @btime TaylorSeries.zero!($x1)
  8.049 ns (0 allocations: 0 bytes)

julia> @btime TaylorSeries.zero!($xN)
  21.815 ns (0 allocations: 0 bytes)

julia> @btime TaylorSeries.zero!($x1N)
  63.053 ns (0 allocations: 0 bytes)

without a single allocation.

This is still very much WIP, but do you think this is an idea worth pursuing @lbenet?

@coveralls
Copy link

coveralls commented Jan 31, 2024

Coverage Status

coverage: 97.649% (-0.4%) from 98.097%
when pulling 40b5267 on PerezHz:jp/mul-mixtures
into f5587a7 on JuliaDiff:master.

@lbenet
Copy link
Member

lbenet commented Jan 31, 2024

I think it is worth pursuing indeed, since the results are really good!

Once said that, let me point to this line, which defines zero!(res, a, ord), and supposedly avoids allocations; I'm not sure why we are incurring into allocating there...

@lbenet
Copy link
Member

lbenet commented Jan 31, 2024

We should indeed keep the methods, e.g. for initialization of the aux quantities created by @taylorize in TaylorIntegration!

@PerezHz
Copy link
Contributor Author

PerezHz commented Jan 31, 2024

Once said that, let me point to this line, which defines zero!(res, a, ord), and supposedly avoids allocations; I'm not sure why we are incurring into allocating there...

AFAIU, that line eventually calls zero(::HomogeneousPolynomial), creating a new variable of that type, which allocates memory for its coefficients.

@lbenet
Copy link
Member

lbenet commented Feb 1, 2024

( I guess a better place for those functions would be in src/functions.jl...) The idea is truly nice! Thanks a lot!

src/functions.jl Outdated Show resolved Hide resolved
@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 2, 2024

I'm trying to also avoid allocations in TaylorSeries.div!, and around this line I found there's an auxiliary TaylorN being declared there, which allocates once for each order of expansion:

aux = TaylorN(zero(a[ordT][0][1]), a[ordT].order)

I'm thinking that a possible workaround is to add it as an argument to div!, so that that auxiliary can be declared somewhere else, but that change would be perhaps a bit breaking? There are other mutating functions with similar auxiliaries defined in the function body, which could benefit from a similar treatment, what are your thoughts on this @lbenet?

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 2, 2024

In other words, whereas this PR makes mul! allocation-free for mixtures, div! is not:

julia> @btime TaylorSeries.mul!($z1N,$x1N,$y1N,0)
  55.880 ns (0 allocations: 0 bytes)

julia> @btime TaylorSeries.div!($z1N,$x1N,$y1N,0)
  360.201 ns (15 allocations: 1.09 KiB)

@lbenet
Copy link
Member

lbenet commented Feb 2, 2024

I'm trying to also avoid allocations in TaylorSeries.div!, and around this line I found there's an auxiliary TaylorN being declared there, which allocates once for each order of expansion:

aux = TaylorN(zero(a[ordT][0][1]), a[ordT].order)

I don't quite remember the actual reason why that aux is needed there, but I think it is related to the sum in the recurrence relation... I recall the dirty tricks needed when defining sums and using @taylorize...

I'm thinking that a possible workaround is to add it as an argument to div!, so that that auxiliary can be declared somewhere else, but that change would be perhaps a bit breaking? There are other mutating functions with similar auxiliaries defined in the function body, which could benefit from a similar treatment, what are your thoughts on this @lbenet?

I don't think we should care about how breaking that change would be to the code; we simply release a new patch version 😄. Once said this, my guess is that such a change impacts in the infrastructure needed for @taylorize to work nicely. I suggest adding a couple of comments (on improving div!) and leave those changes to another PR. My guess is that such auxiliaries occur beyond div!, so solving div! will tell us the way to proceed.

@lbenet
Copy link
Member

lbenet commented Feb 8, 2024

@PerezHz I've checked that aux (in div!, and actually other functions!) is indeed an auxiliary ::TaylorN. I've been also thinking about exploiting the kind of tricks we use in the mutating functions, but it is not too clear to me that we can actually exploit them in TaylorIntegration. My point is that @taylorize declares all those quantities as appropriate Taylor1s, but in this case aux is a TaylorN...

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 8, 2024

I don't quite remember the actual reason why that aux is needed there, but I think it is related to the sum in the recurrence relation... I recall the dirty tricks needed when defining sums and using @taylorize...

Even with the dirty tricks the specialized methods for mixtures allocate a bit less than the generic ones, so it's definitely a step forward I would say!

I don't think we should care about how breaking that change would be to the code; we simply release a new patch version 😄.

Agreed!

Once said this, my guess is that such a change impacts in the infrastructure needed for @taylorize to work nicely.

A bit yes, but not so much, as far as I've managed to test! (more details on this below...)

suggest adding a couple of comments (on improving div!) and leave those changes to another PR.

So far the mutating methods have allowed very good speedups while using @taylorize and yeah, some, or even most, of these mutating methods (pow! for example) allocate Taylor1s and/or TaylorNs explicitly, which kind of "defeats" the purpose of the mutating methods, which is avoiding allocations. My point is, there is still performance potential to gain. I will add these comments and we can try to tackle them in another PR.

@PerezHz I've checked that aux (in div!, and actually other functions!) is indeed an auxiliary ::TaylorN. I've been also thinking about exploiting the kind of tricks we use in the mutating functions, but it is not too clear to me that we can actually exploit them in TaylorIntegration. My point is that @taylorize declares all those quantities as appropriate Taylor1s, but in this case aux is a TaylorN...

That's a good point, and finding a good solution can take more than one attempt, yet I think it's worth pursuing. In a nutshell I guess what I'm trying to say is, we should aim for making TaylorIntegration.jetcoeffs! allocation-free.

Now, I found an issue with pow! for which I found a workaround but still haven't managed to find the reason behind it. Using this PR's TaylorSeries branch the following happens:

julia> using TaylorSeries

julia> dq = set_variables("δq", order=1, numvars=2)
2-element Vector{TaylorN{Float64}}:
  1.0 δq₁ + 𝒪(‖x‖²)
  1.0 δq₂ + 𝒪(‖x‖²)

julia> x = Taylor1([1.0 - 1.8dq[1],-0.007 + 0.0194dq[1], -1e-5 -0.00027dq[2], 3e-7 - 1.8e-5dq[1]-dq[2],dq[1]])
  1.0 - 1.8 δq₁ + 𝒪(‖x‖²) + ( - 0.007 + 0.0194 δq₁ + 𝒪(‖x‖²)) t + ( - 1.0e-5 - 0.00027 δq₂ + 𝒪(‖x‖²)) t² + ( 3.0e-7 - 1.8e-5 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

julia> y = Taylor1([-2.0 + dq[1],-1dq[2],2dq[2], dq[1]+dq[2],dq[2]])
  - 2.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₂ + 𝒪(‖x‖²)) t + ( 2.0 δq₂ + 𝒪(‖x‖²)) t² + ( 1.0 δq₁ + 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₂ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

julia> z = deepcopy(x)
  1.0 - 1.8 δq₁ + 𝒪(‖x‖²) + ( - 0.007 + 0.0194 δq₁ + 𝒪(‖x‖²)) t + ( - 1.0e-5 - 0.00027 δq₂ + 𝒪(‖x‖²)) t² + ( 3.0e-7 - 1.8e-5 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

julia> z.coeffs[1] = x.coeffs[1]^1.5 # this line and the one below is similar to how variables are initialized in parsed jetcoeffs! methods in TaylorIntegration
 1.0 - 2.7 δq₁ + 𝒪(‖x‖²)

julia> z.coeffs[2:end] .= zero(z.coeffs[1]) # similar to what happens in parsed jetcoeffs! methods
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)

julia> TaylorSeries.pow!(z, x, 1.5, 1) # compute 1st-order coefficients

julia> z # the 1st-order coefficients are copied into higher-order coefficients!!
  1.0 - 2.7 δq₁ + 𝒪(‖x‖²) + ( - 0.0105 + 0.03855000000000001 δq₁ + 𝒪(‖x‖²)) t + ( - 0.0105 + 0.03855000000000001 δq₁ + 𝒪(‖x‖²)) t² + ( - 0.0105 + 0.03855000000000001 δq₁ + 𝒪(‖x‖²)) t³ + ( - 0.0105 + 0.03855000000000001 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

any hints about what might be happening here?

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 8, 2024

A workaround is to simply broadcast zero:

z.coeffs[2:end] .= zero.(z.coeffs[1]) # <-- broadcasted zero

@lbenet
Copy link
Member

lbenet commented Feb 8, 2024

I can't reproduce the problem you report here... I'm using Julia v1.10, and TaylorSeries from master...

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 8, 2024

I can't reproduce the problem you report #347 (comment)... I'm using Julia v1.10, and TaylorSeries from master...

Sorry for the confusion, the issue happens with this PR's branch

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 9, 2024

I can't reproduce the problem you report #347 (comment)... I'm using Julia v1.10, and TaylorSeries from master...

... now, if you meant that the issue pops up only using this PR then yes, although I actually would say the issue is hidden in TaylorIntegration. I'll explain in the next comment.

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 9, 2024

Using current TS master:

julia> using TaylorSeries

julia> dq = set_variables("δq", order=1, numvars=2)
2-element Vector{TaylorN{Float64}}:
  1.0 δq₁ + 𝒪(‖x‖²)
  1.0 δq₂ + 𝒪(‖x‖²)

julia> x = Taylor1([1.0+dq[1],-dq[1], dq[2], dq[1]-dq[2],dq[1]])
  1.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₁ + 𝒪(‖x‖²)) t + ( 1.0 δq₂ + 𝒪(‖x‖²)) t² + ( 1.0 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

julia> z = deepcopy(x)
  1.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₁ + 𝒪(‖x‖²)) t + ( 1.0 δq₂ + 𝒪(‖x‖²)) t² + ( 1.0 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

julia> z.coeffs[1] = x.coeffs[1]^1.5
 1.0 + 1.5 δq₁ + 𝒪(‖x‖²)

julia> z.coeffs[2:end] .= zero(z.coeffs[1]) # no zero broadcasting here
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)

julia> z[1][0] = HomogeneousPolynomial(-4.0,0) # only one z coefficient is updated
 - 4.0

julia> z # but the change is copied everywhere!
  1.0 + 1.5 δq₁ + 𝒪(‖x‖²) + ( - 4.0 + 𝒪(‖x‖²)) t + ( - 4.0 + 𝒪(‖x‖²)) t² + ( - 4.0 + 𝒪(‖x‖²)) t³ + ( - 4.0 + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

but this can be easily solved if we broadcast zero while initially filling the coefficients of z:

julia> z.coeffs[2:end] .= zero.(z.coeffs[1]) # zero is broadcasted here
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)

julia> z[1][0] = HomogeneousPolynomial(-4.0,0)
 - 4.0

julia> z
  1.0 + 1.5 δq₁ + 𝒪(‖x‖²) + ( - 4.0 + 𝒪(‖x‖²)) t + 𝒪(t⁵)

AFAIU, the issue is that if zero is not broadcasted then each of the elements of z.coeffs[2:end] point to the same place in memory:

julia> z.coeffs[2:end] .= zero(z.coeffs[1]) # no zero broadcast
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)

julia> z.coeffs[2] == z.coeffs[3]
true

julia> z.coeffs[2] === z.coeffs[3] # check if memory address is the same
true

julia> z.coeffs[2:end] .= zero.(z.coeffs[1]) # now, broadcast zero
4-element view(::Vector{TaylorN{Float64}}, 2:5) with eltype TaylorN{Float64}:
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)
  0.0 + 𝒪(‖x‖²)

julia> z.coeffs[2] == z.coeffs[3]
true

julia> z.coeffs[2] === z.coeffs[3] # now each coefficient points to a different place in memory
false

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 9, 2024

Now, this raises the question: why does this appear in this PR and not in current master? The line that makes the trick is this:

zero!(res, a, ordT)

the way it currently behaves it allocates a new place in memory of each of the elements of res.coeffs, but these lines are precisely the ones that this PR touches, updating coefficients inplace while avoiding allocations. Of course, as you mentioned earlier, there are other allocations that are more tricky to remove, but we can go step by step.

If downstream tests pass (TI, PE, NEOs) then I would propose to go ahead with this and make the update in TaylorIntegration, unless there are other considerations to be made?

@lbenet
Copy link
Member

lbenet commented Feb 9, 2024

I think that z.coeffs[2:end] .= zero(z.coeffs[1]) is (in this PR) somehow behaving as a view, while before (e.g. master) it was behaving as a copy; hence the allocations. Did you check that the workaround, i.e., apply broadcasting to zero too, does allocate?

@lbenet
Copy link
Member

lbenet commented Feb 9, 2024

Just as a check of what I say, consider (in your example):

julia> z = deepcopy(x)
  1.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₁ + 𝒪(‖x‖²)) t + ( 1.0 δq₂ + 𝒪(‖x‖²)) t² + ( 1.0 δq₁ - 1.0 δq₂ + 𝒪(‖x‖²)) t³ + ( 1.0 δq₁ + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

julia> z.coeffs[3:end] .= zero(z.coeffs[1]); # changing from the 2nd order onwards

julia> z[3][0] = HomogeneousPolynomial(6.0,0) # changing order 3!
 6.0

julia> z # order 2 is also changed, but not order 1
  1.0 + 1.0 δq₁ + 𝒪(‖x‖²) + ( - 1.0 δq₁ + 𝒪(‖x‖²)) t + ( 6.0 + 𝒪(‖x‖²)) t² + ( 6.0 + 𝒪(‖x‖²)) t³ + ( 6.0 + 𝒪(‖x‖²)) t⁴ + 𝒪(t⁵)

So indeed that seems to change all coefficients from order 2. I think this is because z[z.coeffs[3:end] is being treated as a view.

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 9, 2024

I think that z.coeffs[2:end] .= zero(z.coeffs[1]) is (in this PR) somehow behaving as a view, while before (e.g. master) it was behaving as a copy; hence the allocations. Did you check that the workaround, i.e., apply broadcasting to zero too, does allocate?

That's a good point, it does allocate, and even more: I would expect this since in the zero-broadcasting case it is allocating different places in memory for each z coefficient:

julia> using BenchmarkTools

julia> function zeronobc(z)
           z.coeffs[2:end] .= zero(z.coeffs[1])
           return nothing
       end
zeronobc (generic function with 1 method)

julia> function zerobc(z)
           z.coeffs[2:end] .= zero.(z.coeffs[1])
           return nothing
       end
zerobc (generic function with 1 method)

julia> @btime zeronobc($z)
  219.727 ns (10 allocations: 752 bytes)

julia> @btime zerobc($z)
  880.123 ns (40 allocations: 2.94 KiB)

src/functions.jl Outdated
@@ -655,6 +655,75 @@ end


# Mutating functions for Taylor1{TaylorN{T}}
@inline function zero!(a::Taylor1{T}, k::Int) where {T<:NumberNotSeries}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for this late suggestion: what about redefining zero!(c, a, k) simply to point to zero!(c, k)? That would avoid lots of changes (which can be kept here), but in particular those related to @taylorize (in dictmutfunct.jl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I changed the definition of zero!(c, a, k)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more suggestion on this: can you move the code (of the zero! methods introduced in this PR), closer to other zero! methods?

Comment on lines 668 to 680
function /(a::NumberNotSeries, b::Taylor1{TaylorN{T}}) where {T<:NumberNotSeries}
iszero(a) && !iszero(b) && return zero(a)

# order and coefficient of first factorized term
# In this case, since a[k]=0 for k>0, we can simplify to:
# ordfact, cdivfact = 0, a/b[0]

res = Taylor1(a/b[0], b.order)
for ordT in eachindex(res)
div!(res, a, b, ordT)
end
return res
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose the following here to ensure type stability (this is untested)

function /(a::NumberNotSeries, b::Taylor1{TaylorN{T}}) where {T<:NumberNotSeries}
    R = promote_type(typeof(a), TaylorN{T})
    iszero(a) && !iszero(b) && return convert(Taylor1{R}, zero(b))

    # order and coefficient of first factorized term
    # In this case, since a[k]=0 for k>0, we can simplify to:
    # ordfact, cdivfact = 0, a/b[0]

    res = Taylor1(a/b[0], b.order)
    for ordT in eachindex(res)
        div!(res, a, b, ordT)
    end
    return res
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a::NumberNotSeries is perhaps too strong, because it leaves outside a::TaylorN{T}, which is a method worth having too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I added the fix for type-stability and widened the type for a to allow for TaylorN{T}

Comment on lines 747 to 748
@inline function div!(c::Taylor1, a::NumberNotSeries,
b::Taylor1, k::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should impose c::Taylor1{T} as well as b::Taylor1{T}...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, although I think we can make an extra tweak to avoid some allocations for T<:TaylorN{<:NumberNotSeries}, so I added an extra method for the latter case (will fix it in the next few commits)

src/arithmetic.jl Outdated Show resolved Hide resolved
@lbenet
Copy link
Member

lbenet commented Feb 14, 2024

Tests in TI.jl are passing now. I'll try to complete my review in the next few days. Thanks a lot!

Project.toml Outdated Show resolved Hide resolved
@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 16, 2024

Tests in TI.jl are passing now. I'll try to complete my review in the next few days. Thanks a lot!

Sure! There's still a couple of things I'd like to try here to save some further allocs, but indeed this is essentially ready for review.

@PerezHz PerezHz changed the title WIP: avoid allocations in inplace methods (proof of concept) Avoid some allocations in mutating methods Feb 16, 2024
@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 18, 2024

I think there's indeed many things that can still be done in order to avoid allocations in mutating functions, but the changes introduced here I think are in a sense self-contained, so I'd rather leave further changes to another PR. Therefore, as it is standing now, this PR is ready for review.

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 18, 2024

(maybe I'm repeating myself but just wanted to be clear that modulo reviews and some fixes, I don't intend a priori to add more things to this PR)

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 23, 2024

I've been running some benchmarks and using this PR for JT integrations in NEOs.jl we get 30% less memory used and about 20% better (shorter) runtime in long (>100 timesteps) integrations. Short integrations, where "time to first time-step" is dominant, will not benefit that much from this PR.

@lbenet
Copy link
Member

lbenet commented Feb 23, 2024

I'll have a look later; the results you just quoted are very encouraging, so we should think about merging and tagging a new version!

@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 23, 2024

I'll have a look later; the results you just quoted are very encouraging, so we should think about merging and tagging a new version!

Alright! Just bumped the minor version, let me know if there is anything else you would like to improve around here.

Copy link
Member

@lbenet lbenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @PerezHz, this is a nice improvement! I have left one single comment (with code), which is a suggestion that may improve performance. In any case, I'm approving this PR and merging it, so we can exploit it in TaylorIntegration!

Thanks again a lot!


@inbounds mul!(c[k], c[0], b[k])
@inbounds for i = 1:k-1
c[k] += c[i] * b[k-i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is related to the TODO above. We do have methods that indeed accumulate as mentioned above for TaylorN. I wonder if replacing this line with those methods can provide further improvements. The suggested code is:

        # c[k] += c[i] * b[k-i] # current code
        for ord in eachindex(c[k])
            mul!(c[k], c[i], b[k-i], ord)
        end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, I'll give it a try and see if we can get rid of some more allocations 😄 !

@lbenet lbenet merged commit bc1735c into JuliaDiff:master Feb 24, 2024
12 checks passed
@PerezHz PerezHz deleted the jp/mul-mixtures branch February 24, 2024 19:29
@PerezHz
Copy link
Contributor Author

PerezHz commented Feb 24, 2024

I'm approving this PR and merging it, so we can exploit it in TaylorIntegration!

Thanks for merging! I'll go ahead now and update things accordingly in TaylorIntegration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants