Different results for two consecutive calls to `fit!` on the same data #22

gdalle · 2024-11-10T11:33:29Z

Hi! Could you please explain the following behavior? Why don't we get the same PCA in both consecutive runs of fit!? It seems the initial state of poisson_epca matters?

julia> using ExpFamilyPCA, BenchmarkTools, StableRNGs

julia> rng = StableRNG(0)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000001)

julia> indim, outdim = 5, 3
(5, 3)

julia> X = rand(rng, 1:100, (10, indim))
10×5 Matrix{Int64}:
 60  27  52  44  56
 39  94  66  68  12
 74   7  63  94  33
 82  40   5  47  35
 90  58  67  22  31
 78  51  78   2  78
 97  53  15  53  85
  9  71  11  86  32
 81  36  53  31  49
 49  46  29  16  62

julia> poisson_epca = PoissonEPCA(indim, outdim)
ExpFamilyPCA.EPCA4(ExpFamilyPCA.var"#Bg#33"(), exp, [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], Options
  metaprogramming: Bool true
  μ: Int64 1
  ϵ: Float64 2.220446049250313e-16
  A_init_value: Float64 1.0
  A_lower: Nothing nothing
  A_upper: Nothing nothing
  A_use_sobol: Bool false
  V_init_value: Float64 1.0
  V_lower: Nothing nothing
  V_upper: Nothing nothing
  V_use_sobol: Bool false
  low: Float64 -1.0e10
  high: Float64 1.0e10
  tol: Float64 1.0e-10
  maxiter: Float64 1.0e6
)

julia> fit!(poisson_epca, X; maxiter=200, verbose=true)
Iteration: 1/200 | Loss: -1514.2407405796748
Iteration: 10/200 | Loss: -1525.3645655713317
Loss converged early. Stopping iteration.
10×3 Matrix{Float64}:
  -2.07062    1.66072     4.87845
  -1.70701    3.51808     1.7321
 -17.7317     6.40644    22.8658
  -2.88374    2.96522     4.19361
   7.60088   -2.07545    -5.06924
  35.0775   -14.8369    -30.9173
   0.1589     1.69354     1.55519
  -8.49425    7.53622     6.17168
   2.40463   -0.199623    0.465979
   9.82005   -2.66434    -8.06655

julia> fit!(poisson_epca, X; maxiter=200, verbose=true)
Iteration: 1/200 | Loss: -1519.6725012076904
Iteration: 10/200 | Loss: -1525.3645628475804
Iteration: 20/200 | Loss: -1525.36456958524
Loss converged early. Stopping iteration.
10×3 Matrix{Float64}:
  0.734307  -0.181266    2.66634
  1.93529    0.730838    0.249316
  0.520143  -5.5078      9.13376
  1.36121   -0.0379275   1.66749
  0.378386   2.78656    -0.366517
 -1.95705   10.3417     -6.93949
  1.20294    0.866997    1.09728
  3.01305   -0.809169    0.307276
  0.489966   1.1395      1.42076
  0.448176   3.56558    -1.63305

The text was updated successfully, but these errors were encountered:

gdalle · 2024-11-10T16:04:48Z

After reading the code more carefully, it seems that nondeterminism in your code is caused by the following two lines:

ExpFamilyPCA.jl/src/utils.jl

Line 123 in cfcc98a

A_new = similar(A)

ExpFamilyPCA.jl/src/utils.jl

Line 155 in cfcc98a

V_new = similar(V)

Creating an Array using similar reuses whatever was in those memory slots before, which can vary from one execution to the next. In this case it is problematic, because you use the columns of A / V to initialize the optimization algorithm, so you may get different results in the end if the algorithm is local. In addition, this makes your functions _initialize_A / _initialize_V useless.

Presumably what you want is to copy the existing A / V instead. I have done this in #23, but I still don't understand why you modify epca.V in-place?

ExpFamilyPCA.jl/src/epca.jl

Line 124 in cfcc98a

epca.V[:] = V

FlyingWorkshop · 2024-11-19T23:40:19Z

Thanks for the catch @gdalle! There are two reasons we update V in place: 1) to keep the EPCA structs immutable, 2) so that we update our "weights" $V$ after fitting. If we removed line 124, we wouldn't be able to "save our progress" every time we fit. This is useful because some people may want to fit! multiple times or call call compress on data not used in training. What are your thoughts on this?

gdalle · 2024-11-20T06:25:10Z

Yeah you definitely want to store the results one way or another. And the name fit! clearly indicates that some of the arguments will be modified. I guess it's just slightly surprising to the user because I initialize the model with

poisson_epca = PoissonEPCA(indim, outdim)

so I don't explicitly see the buffers being created that will later be modified. A little more documentation for fit! is probably the best way to improve this.
An alternative solution would be to choose a syntax like

new_epca = fit(epca, ...)

but again there is nothing wrong with the in-place fit! (and it can unlock better performance if you fully exploit it).

FlyingWorkshop · 2024-11-20T10:19:53Z

I'll add some more documentation for fit! then. Thanks again :)

FlyingWorkshop · 2024-12-06T22:48:52Z

I've updated the documentation to hopefully be more clear on how fit! works.

gdalle · 2024-12-11T22:02:49Z

The fix looks good to me

gdalle mentioned this issue Nov 10, 2024

JOSS review #21

Closed

gdalle mentioned this issue Nov 10, 2024

Miscellaneous improvements #23

Merged

gdalle changed the title ~~Different results for two fittings in a row on the same data~~ Different results for two consecutive calls to fit! on the same data Nov 10, 2024

FlyingWorkshop closed this as completed Dec 6, 2024

FlyingWorkshop reopened this Dec 11, 2024

gdalle closed this as completed Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different results for two consecutive calls to `fit!` on the same data #22

Different results for two consecutive calls to `fit!` on the same data #22

gdalle commented Nov 10, 2024 •

edited

Loading

gdalle commented Nov 10, 2024 •

edited

Loading

FlyingWorkshop commented Nov 19, 2024

gdalle commented Nov 20, 2024 •

edited

Loading

FlyingWorkshop commented Nov 20, 2024

FlyingWorkshop commented Dec 6, 2024

gdalle commented Dec 11, 2024

Different results for two consecutive calls to fit! on the same data #22

Different results for two consecutive calls to fit! on the same data #22

Comments

gdalle commented Nov 10, 2024 • edited Loading

gdalle commented Nov 10, 2024 • edited Loading

FlyingWorkshop commented Nov 19, 2024

gdalle commented Nov 20, 2024 • edited Loading

FlyingWorkshop commented Nov 20, 2024

FlyingWorkshop commented Dec 6, 2024

gdalle commented Dec 11, 2024

Different results for two consecutive calls to `fit!` on the same data #22

Different results for two consecutive calls to `fit!` on the same data #22

gdalle commented Nov 10, 2024 •

edited

Loading

gdalle commented Nov 10, 2024 •

edited

Loading

gdalle commented Nov 20, 2024 •

edited

Loading