Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further allocation reduction in power computations #361

Merged
merged 26 commits into from
Jul 17, 2024

Conversation

PerezHz
Copy link
Contributor

@PerezHz PerezHz commented Jul 9, 2024

This PR continues in the spirit of #347 and #349, introducing an in-place form of power-by-squaring. It manages to avoid almost all allocations, except for one that I haven't managed to take care of so far, here. One way to get rid of it would be to simply add an argument to in-place methods of power, i.e, pow!, such that instead of calling such methods as pow!(res, a, p, k), they would now be called as pow!(res, a, p, aux, k). Or maybe there is another way to get rid of this auxiliary variable? Essentially the problem is that power_by_squaring uses some intermediate variables which don't allocate for non-composite numeric types (structs <: Number), but do allocate otherwise. What is your opinion on this @lbenet?

@lbenet
Copy link
Member

lbenet commented Jul 10, 2024

Thanks @PerezHz for addressing this!

Regarding the question on adding pow!(res, a, p, aux, k), I also thought about this possibility, I agree this is a way to go. The sole inconvenient I see is that we will have to adapt @taylorize macro in TaylorIntegration, which is always a subtle change, yet worth it! Among other, we will have to adapt/extend the internal dictionaries for this 5-argument function. Once said this, I propose to continue with this PR, and check the performance wrt master. If the improvements are worth, we then adapt what is needed here and in TaylorIntegration.

src/arithmetic.jl Outdated Show resolved Hide resolved
# in-place product: `a` <- `a*b`
# this method computes the product `a*b` and saves it back into `a`
# assumes `a` and `b` are of same order
function mul!(a::TaylorN{T}, b::TaylorN{T}) where {T<:Number}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An equivalent method mul!(a::Taylor1{T}, b::Taylor1{T}) where {T<:Number}) could be useful for the extension of pow! for Taylor1{TaylorN{T}}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would indeed by useful, I've added the corresponding method, thanks!

Comment on lines 581 to 590
function mul!(a::TaylorN{T}, b::TaylorN{T}) where {T<:Number}
for k in reverse(eachindex(a))
mul!(a, a, b[0][1], k)
for l in 1:k
mul!(a[k], a[k-l], b[l])
end
end
return nothing
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a good proposal, but here it goes: the code for the next method of mul! could be simplified using the new one proposed in this PR, e.g., by copying a into res, and then using mul!(res, b).

src/power.jl Outdated
@@ -58,37 +58,95 @@ end

^(a::Taylor1{TaylorN{T}}, r::Rational) where {T<:NumberNotSeries} = a^(r.num/r.den)

# in-place form of power_by_squaring
# this method assumes `y`, `x` and `aux1` are of same order
function power_by_squaring!(y::TaylorN{T}, x::TaylorN{T}, aux1::TaylorN{T}, p::Integer) where {T<:NumberNotSeries}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could an equivalent method be added for Taylor1 inputs? Or are they not necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful, yes, I've added it, thanks!

src/power.jl Outdated
@eval power_by_squaring(x::$T, p::Integer) = power_by_squaring(x, Val(p))
@eval power_by_squaring(x::$T, ::Val{0}) = one(x)
@eval power_by_squaring(x::$T, ::Val{1}) = copy(x)
@eval power_by_squaring(x::$T, ::Val{2}) = square(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps adding power_by_squaring(x::$T, ::Val{3}) could be useful too, since it includes the "odd" powers part of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean adding something along the lines of power_by_squaring(x::$T, ::Val{3}) = x*square(x)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... I guess this is the kind of question that we should define in terms of performance. My naive guess is that x*square(x) will allocate more than the direct call to power_by_squaring, but we should definetily check this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked this, and it turns out doing x*square(x) allocates less than power_by_squaring(x) (!), do you think it's worth then to add the ::Val{3} method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you also teste with intervals? I would choose the case that behaves better wrt allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With intervals power_by_squaring(x, 3) on Taylor1{Interval{Float64}} allocates around half as x*square(x), while for TaylorN{Interval{Float64}} allocations are essentially the same

src/power.jl Outdated Show resolved Hide resolved
@lbenet
Copy link
Member

lbenet commented Jul 10, 2024

I'v left few comments, which are simple suggestions

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 10, 2024

Regarding the question on adding pow!(res, a, p, aux, k), I also thought about this possibility, I agree this is a way to go. The sole inconvenient I see is that we will have to adapt @taylorize macro in TaylorIntegration, which is always a subtle change, yet worth it! Among other, we will have to adapt/extend the internal dictionaries for this 5-argument function. Once said this, I propose to continue with this PR, and check the performance wrt master. If the improvements are worth, we then adapt what is needed here and in TaylorIntegration.

Alright, sounds good to me! I'll carry on forward then and report back after comparing performance wrt master.

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 11, 2024

I'v left few comments, which are simple suggestions

Many thanks for your suggestions! I've added some of the methods you suggested (the modification to pow! call signature are still pending, I'm on it); things are working fine except for intervals 😅. Not sure what's happening there, since all the other tests are passing otherwise. Any hints about what might be happening here?

@lbenet
Copy link
Member

lbenet commented Jul 11, 2024

Many thanks for your suggestions! I've added some of the methods you suggested (the modification to pow! call signature are still pending, I'm on it); things are working fine except for intervals 😅. Not sure what's happening there, since all the other tests are passing otherwise. Any hints about what might be happening here?

I'll have a look on what's going on there, and comment on that (later).

@lbenet
Copy link
Member

lbenet commented Jul 11, 2024

I'm not sure where the problem is, yet, but it lies in power_by_squaring itself. Using this PR:

julia> t = Taylor1(10)
 1.0 t + 𝒪(t¹¹)

julia> @which Base.power_by_squaring(t-1,4)
power_by_squaring(x::Taylor1, p::Integer)
     @ TaylorSeries ~/.julia/dev/TaylorSeries/src/power.jl:102

julia> Base.power_by_squaring(t-1,4) # WRONG!
 1.0 - 4.0 t + 4.0- 4.0+ 1.0 t⁴ + 𝒪(t¹¹)

The last result is wrong(the second-order term which should be 6). You get similar problems using higher values of the power.

@lbenet
Copy link
Member

lbenet commented Jul 11, 2024

Tests are broken since 2e4e60a, so we should begin there.

src/power.jl Show resolved Hide resolved
src/power.jl Outdated
c[k] += c[i] * c[k-i]
end
@inbounds c[k] = 2 * c[k]
(kodd == 0) && ( @inbounds c[k] = c[k >> 1]^2 )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following solves the problem...

Suggested change
(kodd == 0) && ( @inbounds c[k] = c[k >> 1]^2 )
(kodd == 0) && ( @inbounds c[k] += c[k >> 1]^2 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! It does the trick, many thanks!

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 11, 2024

Many thanks for the fix @lbenet! I've updated here the call signature for pow! and am in the process of doing the corresponding changes in TaylorIntegration; I'll report back as soon as this is done.

PerezHz added a commit to PerezHz/TaylorIntegration.jl that referenced this pull request Jul 12, 2024
Copy link
Member

@lbenet lbenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PerezHz again for addressing this! I've left few comments which I think are useful. Also, while I was reviewing this PR, I noticed you pushed some new commits. I therefore send this now, and will review back again later.

src/auxiliary.jl Outdated
@simd for ord in lencoef+1:order+1
@inbounds coeffs[ord] = z
@inbounds coeffs[ord] = zero(coeffs[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this change helps avoiding the allocation of z. Yet, I think the idea is to use z (stored in the stack) instead of looking for the value of coeffs[1] and zeroing it inside the for, helps wrt performance. I may be wrong, just want to make sure the idea behind the code is clear.

src/auxiliary.jl Outdated
@simd for ord in lencoef+1:num_coeffs
@inbounds coeffs[ord] = z
@inbounds coeffs[ord] = zero(coeffs[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

src/auxiliary.jl Outdated
@@ -84,7 +82,7 @@ getindex(a::Taylor1{T}, u::StepRange{Int,Int}) where {T<:Number} =
view(a.coeffs, u[:] .+ 1)

setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:Number} = a.coeffs[n+1] = x
setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:AbstractSeries} = setindex!(a.coeffs, deepcopy(x), n+1)
setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:AbstractSeries} = setindex!(a.coeffs, x, n+1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the deepcopy is not necessary here. It was included to ensure independence of x and a.coeffs[n+1] (both are <:AbstractSeries). So, I simply suggest to add a test which ensures this. (Probably there is already one, in which case, there is no need to do anything.)

@@ -156,7 +156,7 @@ struct TaylorN{T<:Number} <: AbstractSeries{T}
order :: Int

function TaylorN{T}(v::Array{HomogeneousPolynomial{T},1}, order::Int) where T<:Number
coeffs = zeros(HomogeneousPolynomial{T}, order)
coeffs = isempty(v) ? zeros(HomogeneousPolynomial{T}, order) : zeros(v[1], order)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we would need similar fixes in Taylor1 and HomogeneousPolynomial...

src/power.jl Outdated
Comment on lines 107 to 114
if $T != HomogeneousPolynomial
function power_by_squaring(x::$T, ::Val{P}) where P
p = P # copy static parameter `P` into local variable `p`
y = zero(x)
aux = zero(x)
power_by_squaring!(y, x, aux, p)
return y
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this if is overwritten for HomogeneousPolynomial by the method defined in line 118...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, I think this code in nicer than the one in line 118...

src/power.jl Outdated
Comment on lines 138 to 140
power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{0}) where {T<:NumberNotSeries} = one(x)
power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{1}) where {T<:NumberNotSeries} = copy(x)
power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{2}) where {T<:NumberNotSeries} = square(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these methods are defined above in lines 104-106...

src/power.jl Outdated
Comment on lines 258 to 259
@inline pow!(c::Taylor1{T}, a::Taylor1{T}, b::Taylor1{T}, r::S, k::Int) where
{T<:Number, S <: Real} = pow!(c, a, b, Val(r), k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since r is not necessarily an Int, I fear that this method could overpopulate methods to dispatch, making slower the dispatch stage.

If we add a concrete method for Int and for other numeric types (whose value is not an integer and will not use Val), I think this would be nice.

@lbenet
Copy link
Member

lbenet commented Jul 16, 2024

I added power_by_squaring(x, ::Val{3}) methods, and some extra tests which check consistency between products and powers of Taylor1 and TaylorN variables. Also bumped the patch version.

Tests are passing, which is always nice!

I fully agree. I was checking where is power_by_squaring used, and is used for integer powers (when the parametric type is Int or Interval, with TaylorNs), obviously, and in evaluate of HomogeneousPolynomials. I think the case of evaluating HomogeneousPolynomials is where this PR could be quite important. And pow! should be interesting in combination with @taylorize.

On this point, together with @LuEdRaMo at JuliaCon we were figuring out a new evaluate! method for arrays of TaylorN evaluated at another TaylorN. I have a prototype for this and will open a PR in the next few days.

Regarding the patch version, I think in this case (including the future PR you mention on evaluate!) we should release a minor version, since pow! changes a lot, and it is not a simple performance improvement or bug release. In any case, I think we should only bump that version once everything works nicely in TaylorIntegration, in particular wrt @taylorize.

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 16, 2024

Regarding the patch version, I think in this case (including the future PR you mention on evaluate!) we should release a minor version, since pow! changes a lot, and it is not a simple performance improvement or bug release. In any case, I think we should only bump that version once everything works nicely in TaylorIntegration, in particular wrt @taylorize.

That is actually a good point. I agree that if we bump a patch release for this PR this can break code relying on pow!, so should we skip tagging a release for this PR? Or maybe tagging a new patch version with deprecation warnings and then bumping the minor version without deprecation warnings?

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 17, 2024

Just pushed a commit switching back from the dispatches-by-value to if/else in pow!

@lbenet
Copy link
Member

lbenet commented Jul 17, 2024

That is actually a good point. I agree that if we bump a patch release for this PR this can break code relying on pow!, so should we skip tagging a release for this PR? Or maybe tagging a new patch version with deprecation warnings and then bumping the minor version without deprecation warnings?

I think this PR merits a new minor version release; once said this, we may merge it to master without tagging a new release, in order to include the ideas you have on evaluate!, or tagging it and then have a new patch version for the other PR (as long as it doesn't change too much the internals). In my opinion, we should wait to have both PRs merged and then tag the minor version, but I'm open to your suggestions.

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 17, 2024

I think this PR merits a new minor version release; once said this, we may merge it to master without tagging a new release, in order to include the ideas you have on evaluate!, or tagging it and then have a new patch version for the other PR (as long as it doesn't change too much the internals). In my opinion, we should wait to have both PRs merged and then tag the minor version, but I'm open to your suggestions.

Okay, let's merge this on master and then merge the corresponding PR on TaylorIntegration without tagging, and tag after we're sure everything is looking fine. The evaluate! PR I'll try to open in the next few days, in the end I think it won't change the internals too much.

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 17, 2024

I reverted the change in the setindex! method for mixtures with Taylor1 and added corresponding tests. Even though removing the deepcopy did not affect tests neither here nor in TaylorIntegration, it's perhaps better to look at it in more detail afterwards. The now-reverted change in setindex! also does not affect this PR directly.

@lbenet
Copy link
Member

lbenet commented Jul 17, 2024

Okay, let's merge this on master and then merge the corresponding PR on TaylorIntegration without tagging, and tag after we're sure everything is looking fine. The evaluate! PR I'll try to open in the next few days, in the end I think it won't change the internals too much.

Fine with me. Naive question: have all suggestions of the review been addressed? Shall I make another (final) review?

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 17, 2024

Fine with me. Naive question: have all suggestions of the review been addressed? Shall I make another (final) review?

As far as I've looked I think I managed to address everything, but perhaps another quick round of review can help just in case I missed something. One thing to note is that I reverted the patch version bumping, so this PR does not change version (it will still be v0.17.8). Otherwise, from my perspective this is ready for merging.

@lbenet
Copy link
Member

lbenet commented Jul 17, 2024

I'll have a look today; regarding the version, v0.17.8 will be the tag of the proper branch, and this PR will be available from master. In other words, dev version and the last tagged version will be different.

src/power.jl Show resolved Hide resolved
Copy link
Member

@lbenet lbenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. I'll go ahead and merge

@lbenet lbenet merged commit 34d0fea into JuliaDiff:master Jul 17, 2024
13 checks passed
@lbenet
Copy link
Member

lbenet commented Jul 17, 2024

Mrged!

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 17, 2024

@lbenet sorry I think I pushed a commit on my fork just before you merged which I think better reflects where the mutating power_by_squaring! method is used, I can try to open a new PR with the remaining commit

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 17, 2024

... and thanks for mergning by the way!

@lbenet
Copy link
Member

lbenet commented Jul 17, 2024

Sorry... I was simply too fast... (I thought over my suggestions and they may not be really crutial at this stage.) So a new PR is ok

@PerezHz
Copy link
Contributor Author

PerezHz commented Jul 17, 2024

Sorry... I was simply too fast... (I thought over my suggestions and they may not be really crutial at this stage.) So a new PR is ok

On the contrary thank you for looking into all this! Just opened a new PR (see #363)

PerezHz added a commit to PerezHz/NEOs.jl that referenced this pull request Jul 22, 2024
PerezHz added a commit to PerezHz/TaylorIntegration.jl that referenced this pull request Jul 22, 2024
* Updates due to JuliaDiff/TaylorSeries.jl#361

* Remove comments and small fix

* Update CI (use TaylorSeries branch from fork); use `diffeq!` in non-parsed jetcoeffs! methods

* Update CI

* Use diffeq! in preamble

* Retain only `local` declarations in preamble

* Trigger CI

* Update CI

* Update Project.toml

* Update CI

* Rename diffeq! -> ode!

* Update ci.yml

* Trigger CI

* Update README.md

* Rename ode! -> solcoeff!

* Bump patch version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants