Further allocation reduction in power computations #361

PerezHz · 2024-07-09T14:31:56Z

This PR continues in the spirit of #347 and #349, introducing an in-place form of power-by-squaring. It manages to avoid almost all allocations, except for one that I haven't managed to take care of so far, here. One way to get rid of it would be to simply add an argument to in-place methods of power, i.e, pow!, such that instead of calling such methods as pow!(res, a, p, k), they would now be called as pow!(res, a, p, aux, k). Or maybe there is another way to get rid of this auxiliary variable? Essentially the problem is that power_by_squaring uses some intermediate variables which don't allocate for non-composite numeric types (structs <: Number), but do allocate otherwise. What is your opinion on this @lbenet?

…_by_squaring!; add in-place sqr! method

…y Aqua)

lbenet · 2024-07-10T15:57:49Z

Thanks @PerezHz for addressing this!

Regarding the question on adding pow!(res, a, p, aux, k), I also thought about this possibility, I agree this is a way to go. The sole inconvenient I see is that we will have to adapt @taylorize macro in TaylorIntegration, which is always a subtle change, yet worth it! Among other, we will have to adapt/extend the internal dictionaries for this 5-argument function. Once said this, I propose to continue with this PR, and check the performance wrt master. If the improvements are worth, we then adapt what is needed here and in TaylorIntegration.

src/arithmetic.jl

lbenet · 2024-07-10T15:58:16Z

src/arithmetic.jl

+# in-place product: `a` <- `a*b`
+# this method computes the product `a*b` and saves it back into `a`
+# assumes `a` and `b` are of same order
+function mul!(a::TaylorN{T}, b::TaylorN{T}) where {T<:Number}


An equivalent method mul!(a::Taylor1{T}, b::Taylor1{T}) where {T<:Number}) could be useful for the extension of pow! for Taylor1{TaylorN{T}}?

It would indeed by useful, I've added the corresponding method, thanks!

lbenet · 2024-07-10T15:58:28Z

src/arithmetic.jl

+function mul!(a::TaylorN{T}, b::TaylorN{T}) where {T<:Number}
+    for k in reverse(eachindex(a))
+        mul!(a, a, b[0][1], k)
+        for l in 1:k
+            mul!(a[k], a[k-l], b[l])
+        end
+    end
+    return nothing
+end
+


Not sure if this is a good proposal, but here it goes: the code for the next method of mul! could be simplified using the new one proposed in this PR, e.g., by copying a into res, and then using mul!(res, b).

lbenet · 2024-07-10T15:58:38Z

src/power.jl

@@ -58,37 +58,95 @@ end

 ^(a::Taylor1{TaylorN{T}}, r::Rational) where {T<:NumberNotSeries} = a^(r.num/r.den)

+# in-place form of power_by_squaring
+# this method assumes `y`, `x` and `aux1` are of same order
+function power_by_squaring!(y::TaylorN{T}, x::TaylorN{T}, aux1::TaylorN{T}, p::Integer) where {T<:NumberNotSeries}


Could an equivalent method be added for Taylor1 inputs? Or are they not necessary?

I think it would be useful, yes, I've added it, thanks!

lbenet · 2024-07-10T15:58:42Z

src/power.jl

+    @eval power_by_squaring(x::$T, p::Integer) = power_by_squaring(x, Val(p))
+    @eval power_by_squaring(x::$T, ::Val{0}) = one(x)
+    @eval power_by_squaring(x::$T, ::Val{1}) = copy(x)
+    @eval power_by_squaring(x::$T, ::Val{2}) = square(x)


Perhaps adding power_by_squaring(x::$T, ::Val{3}) could be useful too, since it includes the "odd" powers part of the function.

You mean adding something along the lines of power_by_squaring(x::$T, ::Val{3}) = x*square(x)?

Good point... I guess this is the kind of question that we should define in terms of performance. My naive guess is that x*square(x) will allocate more than the direct call to power_by_squaring, but we should definetily check this.

I just checked this, and it turns out doing x*square(x) allocates less than power_by_squaring(x) (!), do you think it's worth then to add the ::Val{3} method?

Have you also teste with intervals? I would choose the case that behaves better wrt allocations.

With intervals power_by_squaring(x, 3) on Taylor1{Interval{Float64}} allocates around half as x*square(x), while for TaylorN{Interval{Float64}} allocations are essentially the same

src/power.jl

lbenet · 2024-07-10T16:00:25Z

I'v left few comments, which are simple suggestions

PerezHz · 2024-07-10T20:31:29Z

Regarding the question on adding pow!(res, a, p, aux, k), I also thought about this possibility, I agree this is a way to go. The sole inconvenient I see is that we will have to adapt @taylorize macro in TaylorIntegration, which is always a subtle change, yet worth it! Among other, we will have to adapt/extend the internal dictionaries for this 5-argument function. Once said this, I propose to continue with this PR, and check the performance wrt master. If the improvements are worth, we then adapt what is needed here and in TaylorIntegration.

Alright, sounds good to me! I'll carry on forward then and report back after comparing performance wrt master.

PerezHz · 2024-07-11T09:07:59Z

I'v left few comments, which are simple suggestions

Many thanks for your suggestions! I've added some of the methods you suggested (the modification to pow! call signature are still pending, I'm on it); things are working fine except for intervals 😅. Not sure what's happening there, since all the other tests are passing otherwise. Any hints about what might be happening here?

lbenet · 2024-07-11T09:20:48Z

Many thanks for your suggestions! I've added some of the methods you suggested (the modification to pow! call signature are still pending, I'm on it); things are working fine except for intervals 😅. Not sure what's happening there, since all the other tests are passing otherwise. Any hints about what might be happening here?

I'll have a look on what's going on there, and comment on that (later).

lbenet · 2024-07-11T14:14:08Z

I'm not sure where the problem is, yet, but it lies in power_by_squaring itself. Using this PR:

julia> t = Taylor1(10)
 1.0 t + 𝒪(t¹¹)

julia> @which Base.power_by_squaring(t-1,4)
power_by_squaring(x::Taylor1, p::Integer)
     @ TaylorSeries ~/.julia/dev/TaylorSeries/src/power.jl:102

julia> Base.power_by_squaring(t-1,4) # WRONG!
 1.0 - 4.0 t + 4.0 t² - 4.0 t³ + 1.0 t⁴ + 𝒪(t¹¹)

The last result is wrong(the second-order term which should be 6). You get similar problems using higher values of the power.

lbenet · 2024-07-11T16:30:30Z

Tests are broken since 2e4e60a, so we should begin there.

src/power.jl

lbenet · 2024-07-11T18:44:02Z

src/power.jl

+                    c[k] += c[i] * c[k-i]
+                end
+                @inbounds c[k] = 2 * c[k]
+                (kodd == 0) && ( @inbounds c[k] = c[k >> 1]^2 )


I think the following solves the problem...

Suggested change

(kodd == 0) && ( @inbounds c[k] = c[k >> 1]^2 )

(kodd == 0) && ( @inbounds c[k] += c[k >> 1]^2 )

Great catch! It does the trick, many thanks!

PerezHz · 2024-07-11T20:06:31Z

Many thanks for the fix @lbenet! I've updated here the call signature for pow! and am in the process of doing the corresponding changes in TaylorIntegration; I'll report back as soon as this is done.

lbenet

Thanks @PerezHz again for addressing this! I've left few comments which I think are useful. Also, while I was reviewing this PR, I noticed you pushed some new commits. I therefore send this now, and will review back again later.

lbenet · 2024-07-12T19:33:40Z

src/auxiliary.jl

        @simd for ord in lencoef+1:order+1
-            @inbounds coeffs[ord] = z
+            @inbounds coeffs[ord] = zero(coeffs[1])


I understand that this change helps avoiding the allocation of z. Yet, I think the idea is to use z (stored in the stack) instead of looking for the value of coeffs[1] and zeroing it inside the for, helps wrt performance. I may be wrong, just want to make sure the idea behind the code is clear.

lbenet · 2024-07-12T20:27:58Z

src/auxiliary.jl

    @simd for ord in lencoef+1:num_coeffs
-        @inbounds coeffs[ord] = z
+        @inbounds coeffs[ord] = zero(coeffs[1])


Same comment as above

lbenet · 2024-07-12T20:31:10Z

src/auxiliary.jl

@@ -84,7 +82,7 @@ getindex(a::Taylor1{T}, u::StepRange{Int,Int}) where {T<:Number} =
    view(a.coeffs, u[:] .+ 1)

 setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:Number} = a.coeffs[n+1] = x
-setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:AbstractSeries} = setindex!(a.coeffs, deepcopy(x), n+1)
+setindex!(a::Taylor1{T}, x::T, n::Int) where {T<:AbstractSeries} = setindex!(a.coeffs, x, n+1)


Maybe the deepcopy is not necessary here. It was included to ensure independence of x and a.coeffs[n+1] (both are <:AbstractSeries). So, I simply suggest to add a test which ensures this. (Probably there is already one, in which case, there is no need to do anything.)

lbenet · 2024-07-12T20:37:43Z

src/constructors.jl

@@ -156,7 +156,7 @@ struct TaylorN{T<:Number} <: AbstractSeries{T}
    order   :: Int

    function TaylorN{T}(v::Array{HomogeneousPolynomial{T},1}, order::Int) where T<:Number
-        coeffs = zeros(HomogeneousPolynomial{T}, order)
+        coeffs = isempty(v) ? zeros(HomogeneousPolynomial{T}, order) : zeros(v[1], order)


I guess we would need similar fixes in Taylor1 and HomogeneousPolynomial...

lbenet · 2024-07-16T10:45:06Z

src/power.jl

+        if $T != HomogeneousPolynomial
+            function power_by_squaring(x::$T, ::Val{P}) where P
+                p = P # copy static parameter `P` into local variable `p`
+                y = zero(x)
+                aux = zero(x)
+                power_by_squaring!(y, x, aux, p)
+                return y
+            end


I think this if is overwritten for HomogeneousPolynomial by the method defined in line 118...

Incidentally, I think this code in nicer than the one in line 118...

lbenet · 2024-07-16T10:46:31Z

src/power.jl

+power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{0}) where {T<:NumberNotSeries} = one(x)
+power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{1}) where {T<:NumberNotSeries} = copy(x)
+power_by_squaring(x::TaylorN{Taylor1{T}}, ::Val{2}) where {T<:NumberNotSeries} = square(x)


I think these methods are defined above in lines 104-106...

lbenet · 2024-07-16T10:49:26Z

src/power.jl

+@inline pow!(c::Taylor1{T}, a::Taylor1{T}, b::Taylor1{T}, r::S, k::Int) where
+        {T<:Number, S <: Real} = pow!(c, a, b, Val(r), k)


Since r is not necessarily an Int, I fear that this method could overpopulate methods to dispatch, making slower the dispatch stage.

If we add a concrete method for Int and for other numeric types (whose value is not an integer and will not use Val), I think this would be nice.

lbenet · 2024-07-16T11:03:35Z

I added power_by_squaring(x, ::Val{3}) methods, and some extra tests which check consistency between products and powers of Taylor1 and TaylorN variables. Also bumped the patch version.

Tests are passing, which is always nice!

I fully agree. I was checking where is power_by_squaring used, and is used for integer powers (when the parametric type is Int or Interval, with TaylorNs), obviously, and in evaluate of HomogeneousPolynomials. I think the case of evaluating HomogeneousPolynomials is where this PR could be quite important. And pow! should be interesting in combination with @taylorize.

On this point, together with @LuEdRaMo at JuliaCon we were figuring out a new evaluate! method for arrays of TaylorN evaluated at another TaylorN. I have a prototype for this and will open a PR in the next few days.

Regarding the patch version, I think in this case (including the future PR you mention on evaluate!) we should release a minor version, since pow! changes a lot, and it is not a simple performance improvement or bug release. In any case, I think we should only bump that version once everything works nicely in TaylorIntegration, in particular wrt @taylorize.

PerezHz · 2024-07-16T19:56:32Z

Regarding the patch version, I think in this case (including the future PR you mention on evaluate!) we should release a minor version, since pow! changes a lot, and it is not a simple performance improvement or bug release. In any case, I think we should only bump that version once everything works nicely in TaylorIntegration, in particular wrt @taylorize.

That is actually a good point. I agree that if we bump a patch release for this PR this can break code relying on pow!, so should we skip tagging a release for this PR? Or maybe tagging a new patch version with deprecation warnings and then bumping the minor version without deprecation warnings?

PerezHz · 2024-07-17T14:03:37Z

Just pushed a commit switching back from the dispatches-by-value to if/else in pow!

lbenet · 2024-07-17T15:58:54Z

That is actually a good point. I agree that if we bump a patch release for this PR this can break code relying on pow!, so should we skip tagging a release for this PR? Or maybe tagging a new patch version with deprecation warnings and then bumping the minor version without deprecation warnings?

I think this PR merits a new minor version release; once said this, we may merge it to master without tagging a new release, in order to include the ideas you have on evaluate!, or tagging it and then have a new patch version for the other PR (as long as it doesn't change too much the internals). In my opinion, we should wait to have both PRs merged and then tag the minor version, but I'm open to your suggestions.

PerezHz · 2024-07-17T18:53:53Z

I think this PR merits a new minor version release; once said this, we may merge it to master without tagging a new release, in order to include the ideas you have on evaluate!, or tagging it and then have a new patch version for the other PR (as long as it doesn't change too much the internals). In my opinion, we should wait to have both PRs merged and then tag the minor version, but I'm open to your suggestions.

Okay, let's merge this on master and then merge the corresponding PR on TaylorIntegration without tagging, and tag after we're sure everything is looking fine. The evaluate! PR I'll try to open in the next few days, in the end I think it won't change the internals too much.

PerezHz · 2024-07-17T18:58:32Z

I reverted the change in the setindex! method for mixtures with Taylor1 and added corresponding tests. Even though removing the deepcopy did not affect tests neither here nor in TaylorIntegration, it's perhaps better to look at it in more detail afterwards. The now-reverted change in setindex! also does not affect this PR directly.

lbenet · 2024-07-17T19:03:46Z

Okay, let's merge this on master and then merge the corresponding PR on TaylorIntegration without tagging, and tag after we're sure everything is looking fine. The evaluate! PR I'll try to open in the next few days, in the end I think it won't change the internals too much.

Fine with me. Naive question: have all suggestions of the review been addressed? Shall I make another (final) review?

PerezHz · 2024-07-17T19:14:54Z

Fine with me. Naive question: have all suggestions of the review been addressed? Shall I make another (final) review?

As far as I've looked I think I managed to address everything, but perhaps another quick round of review can help just in case I missed something. One thing to note is that I reverted the patch version bumping, so this PR does not change version (it will still be v0.17.8). Otherwise, from my perspective this is ready for merging.

lbenet · 2024-07-17T19:19:46Z

I'll have a look today; regarding the version, v0.17.8 will be the tag of the proper branch, and this PR will be available from master. In other words, dev version and the last tagged version will be different.

src/power.jl

lbenet

Thanks a lot. I'll go ahead and merge

lbenet · 2024-07-17T20:12:27Z

Mrged!

PerezHz · 2024-07-17T20:20:11Z

@lbenet sorry I think I pushed a commit on my fork just before you merged which I think better reflects where the mutating power_by_squaring! method is used, I can try to open a new PR with the remaining commit

PerezHz · 2024-07-17T20:20:48Z

... and thanks for mergning by the way!

lbenet · 2024-07-17T20:22:06Z

Sorry... I was simply too fast... (I thought over my suggestions and they may not be really crutial at this stage.) So a new PR is ok

PerezHz · 2024-07-17T20:40:16Z

Sorry... I was simply too fast... (I thought over my suggestions and they may not be really crutial at this stage.) So a new PR is ok

On the contrary thank you for looking into all this! Just opened a new PR (see #363)

* Updates due to JuliaDiff/TaylorSeries.jl#361 * Remove comments and small fix * Update CI (use TaylorSeries branch from fork); use `diffeq!` in non-parsed jetcoeffs! methods * Update CI * Use diffeq! in preamble * Retain only `local` declarations in preamble * Trigger CI * Update CI * Update Project.toml * Update CI * Rename diffeq! -> ode! * Update ci.yml * Trigger CI * Update README.md * Rename ode! -> solcoeff! * Bump patch version

PerezHz added 7 commits July 8, 2024 20:17

working version with oop sqr!

dbdbcc1

Update Project.toml

a34ce7b

Update src/TaylorSeries.jl

c04d9f8

Update zero(x); add in-place mul! for TaylorN

5d3d57d

Get rid of most allocations in pow! for Taylor1{TaylorN{T}; add power…

a9c770b

…_by_squaring!; add in-place sqr! method

Remove extra empty line

5bcccf4

Add power_by_squaring methods to avoid method ambiguities (detected b…

6cce90a

…y Aqua)

lbenet reviewed Jul 10, 2024

View reviewed changes

src/arithmetic.jl Outdated Show resolved Hide resolved

lbenet reviewed Jul 10, 2024

View reviewed changes

src/power.jl Outdated Show resolved Hide resolved

Remove duplicated return

a205dcd

PerezHz added 2 commits July 10, 2024 22:42

Add inbounds

e44811c

Address review by @lbenet

2e4e60a

PerezHz added 2 commits July 11, 2024 12:28

Add extra arg to pow!

f581b21

Update TaylorSeries IA extension

27c3d0e

lbenet reviewed Jul 11, 2024

View reviewed changes

src/power.jl Show resolved Hide resolved

lbenet reviewed Jul 11, 2024

View reviewed changes

Update pow! and add fix suggested by @lbenet

dc6a68c

PerezHz added a commit to PerezHz/TaylorIntegration.jl that referenced this pull request Jul 12, 2024

Some temporary updates related to JuliaDiff/TaylorSeries.jl#361

7b4100d

lbenet reviewed Jul 16, 2024

View reviewed changes

Switch back from dispatch-by-value to if/else for pow!

ba30e7a

Revert change in setindex! overload for Taylor1

ea15252

Add test for nested Taylor1s setindex! method

21c876b

De-bump patch version

702fbac

Another approach to suggestion by @lbenet

6264eae

PerezHz added 2 commits July 17, 2024 21:26

Fix typo

4f613d6

Fix another typo

ad18057

lbenet reviewed Jul 17, 2024

View reviewed changes

src/power.jl Show resolved Hide resolved

lbenet approved these changes Jul 17, 2024

View reviewed changes

lbenet merged commit 34d0fea into JuliaDiff:master Jul 17, 2024
13 checks passed

PerezHz mentioned this pull request Jul 17, 2024

Refactor power_by_squaring code (meant to be put into #361) #363

Merged

This was referenced Jul 20, 2024

Use @allocated/@allocations to test for allocations in mutating methods #365

Open

Add evaluation method for arrays of TaylorN #364

Merged

PerezHz added a commit to PerezHz/NEOs.jl that referenced this pull request Jul 22, 2024

Test JuliaDiff/TaylorSeries.jl#361

1b4f7ab

	(kodd == 0) && ( @inbounds c[k] = c[k >> 1]^2 )
	(kodd == 0) && ( @inbounds c[k] += c[k >> 1]^2 )

		@inline pow!(c::Taylor1{T}, a::Taylor1{T}, b::Taylor1{T}, r::S, k::Int) where
		{T<:Number, S <: Real} = pow!(c, a, b, Val(r), k)

Further allocation reduction in power computations #361

Further allocation reduction in power computations #361

Conversation

PerezHz commented Jul 9, 2024

lbenet commented Jul 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbenet commented Jul 10, 2024

PerezHz commented Jul 10, 2024

PerezHz commented Jul 11, 2024

lbenet commented Jul 11, 2024

lbenet commented Jul 11, 2024

lbenet commented Jul 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PerezHz commented Jul 11, 2024

lbenet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbenet commented Jul 16, 2024

PerezHz commented Jul 16, 2024

PerezHz commented Jul 17, 2024

lbenet commented Jul 17, 2024

PerezHz commented Jul 17, 2024

PerezHz commented Jul 17, 2024

lbenet commented Jul 17, 2024

PerezHz commented Jul 17, 2024

lbenet commented Jul 17, 2024

lbenet left a comment

Choose a reason for hiding this comment

lbenet commented Jul 17, 2024

PerezHz commented Jul 17, 2024

PerezHz commented Jul 17, 2024

lbenet commented Jul 17, 2024

PerezHz commented Jul 17, 2024