Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results for two consecutive calls to fit! on the same data #22

Closed
gdalle opened this issue Nov 10, 2024 · 6 comments
Closed

Comments

@gdalle
Copy link
Contributor

gdalle commented Nov 10, 2024

Hi! Could you please explain the following behavior? Why don't we get the same PCA in both consecutive runs of fit!? It seems the initial state of poisson_epca matters?

julia> using ExpFamilyPCA, BenchmarkTools, StableRNGs

julia> rng = StableRNG(0)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000001)

julia> indim, outdim = 5, 3
(5, 3)

julia> X = rand(rng, 1:100, (10, indim))
10×5 Matrix{Int64}:
 60  27  52  44  56
 39  94  66  68  12
 74   7  63  94  33
 82  40   5  47  35
 90  58  67  22  31
 78  51  78   2  78
 97  53  15  53  85
  9  71  11  86  32
 81  36  53  31  49
 49  46  29  16  62

julia> poisson_epca = PoissonEPCA(indim, outdim)
ExpFamilyPCA.EPCA4(ExpFamilyPCA.var"#Bg#33"(), exp, [1.0 1.0  1.0 1.0; 1.0 1.0  1.0 1.0; 1.0 1.0  1.0 1.0], Options
  metaprogramming: Bool true
  μ: Int64 1
  ϵ: Float64 2.220446049250313e-16
  A_init_value: Float64 1.0
  A_lower: Nothing nothing
  A_upper: Nothing nothing
  A_use_sobol: Bool false
  V_init_value: Float64 1.0
  V_lower: Nothing nothing
  V_upper: Nothing nothing
  V_use_sobol: Bool false
  low: Float64 -1.0e10
  high: Float64 1.0e10
  tol: Float64 1.0e-10
  maxiter: Float64 1.0e6
)

julia> fit!(poisson_epca, X; maxiter=200, verbose=true)
Iteration: 1/200 | Loss: -1514.2407405796748
Iteration: 10/200 | Loss: -1525.3645655713317
Loss converged early. Stopping iteration.
10×3 Matrix{Float64}:
  -2.07062    1.66072     4.87845
  -1.70701    3.51808     1.7321
 -17.7317     6.40644    22.8658
  -2.88374    2.96522     4.19361
   7.60088   -2.07545    -5.06924
  35.0775   -14.8369    -30.9173
   0.1589     1.69354     1.55519
  -8.49425    7.53622     6.17168
   2.40463   -0.199623    0.465979
   9.82005   -2.66434    -8.06655

julia> fit!(poisson_epca, X; maxiter=200, verbose=true)
Iteration: 1/200 | Loss: -1519.6725012076904
Iteration: 10/200 | Loss: -1525.3645628475804
Iteration: 20/200 | Loss: -1525.36456958524
Loss converged early. Stopping iteration.
10×3 Matrix{Float64}:
  0.734307  -0.181266    2.66634
  1.93529    0.730838    0.249316
  0.520143  -5.5078      9.13376
  1.36121   -0.0379275   1.66749
  0.378386   2.78656    -0.366517
 -1.95705   10.3417     -6.93949
  1.20294    0.866997    1.09728
  3.01305   -0.809169    0.307276
  0.489966   1.1395      1.42076
  0.448176   3.56558    -1.63305
@gdalle gdalle mentioned this issue Nov 10, 2024
@gdalle
Copy link
Contributor Author

gdalle commented Nov 10, 2024

After reading the code more carefully, it seems that nondeterminism in your code is caused by the following two lines:

A_new = similar(A)

V_new = similar(V)

Creating an Array using similar reuses whatever was in those memory slots before, which can vary from one execution to the next. In this case it is problematic, because you use the columns of A / V to initialize the optimization algorithm, so you may get different results in the end if the algorithm is local. In addition, this makes your functions _initialize_A / _initialize_V useless.

Presumably what you want is to copy the existing A / V instead. I have done this in #23, but I still don't understand why you modify epca.V in-place?

epca.V[:] = V

@gdalle gdalle changed the title Different results for two fittings in a row on the same data Different results for two consecutive calls to fit! on the same data Nov 10, 2024
@FlyingWorkshop
Copy link
Collaborator

Thanks for the catch @gdalle! There are two reasons we update V in place: 1) to keep the EPCA structs immutable, 2) so that we update our "weights" $V$ after fitting. If we removed line 124, we wouldn't be able to "save our progress" every time we fit. This is useful because some people may want to fit! multiple times or call call compress on data not used in training. What are your thoughts on this?

@gdalle
Copy link
Contributor Author

gdalle commented Nov 20, 2024

Yeah you definitely want to store the results one way or another. And the name fit! clearly indicates that some of the arguments will be modified. I guess it's just slightly surprising to the user because I initialize the model with

poisson_epca = PoissonEPCA(indim, outdim)

so I don't explicitly see the buffers being created that will later be modified. A little more documentation for fit! is probably the best way to improve this.
An alternative solution would be to choose a syntax like

new_epca = fit(epca, ...)

but again there is nothing wrong with the in-place fit! (and it can unlock better performance if you fully exploit it).

@FlyingWorkshop
Copy link
Collaborator

I'll add some more documentation for fit! then. Thanks again :)

@FlyingWorkshop
Copy link
Collaborator

I've updated the documentation to hopefully be more clear on how fit! works.

@gdalle
Copy link
Contributor Author

gdalle commented Dec 11, 2024

The fix looks good to me

@gdalle gdalle closed this as completed Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants