How to calculate likelihoods? (sklearn reproduction) #1661
-
I'm sure I must be missing something obvious, but I'm having a heck of time computing likelihoods that reproduce the values I get from sklearn. The notebook available here shows a model implemented in sklearn and gpytorch that produces predictions in near perfect agreement, but I'm not able to compute a log likelihood from gpytorch that matches what I get from sklearn. To verify that I'm not totally off base, I went back to Rasmussen and Williams Eq 5.8 to compute the log likelihood long hand, and using the kernel matrix accessible from sklearn, I get the same value as sklearn's built-in method for computing that value. However, when I try to do the same calculation using the gpytorch kernel matrix accessed from my model using (I'm sure there must be a way to calculate the log likelihood idiomatically without resorting to the textbook formula, but I haven't been able to to figure that out, and in any case, I'd love to be able to reproduce the calculation via the textbook formula just to make sure I understand how gpytorch works). I'm also puzzled by the difference between the kernel matrices produced by gpytorch and sklearn. They are strikingly similar (differences are close to zero for the vast majority of entries), and they seem to produce nearly identical predictions, but there is a surprising amount of structure in the differences (sharp banding). Any ideas what might be going on there? Apologies if I've made some basic mistake or have misunderstood something fundamental. Any help and/or insight would be greatly appreciated! Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Looking at your notebook, the first thing I notice is that you're computing the log likelihood using Given how both the log determinant and the linear solve |
Beta Was this translation helpful? Give feedback.
Looking at your notebook, the first thing I notice is that you're computing the log likelihood using
K = model.covar_module(X_torch)
as the covariance matrix. Note that this doesn't include the likelihood noise -- e.g., it's notK + \sigma^2 I
. The+ \sigma^2 I
is actually extremely important, even when\sigma
is small. This is because kernel matrices can easily have smallest eigenvalues on the order of like 1e-30, so even when\sigma
is like 0.01, adding\sigma^2 I
can increase the smallest eigenvalue ofK
by like 25 orders of magnitude.Given how both the log determinant and the linear solve
K^{-1}y
relate to the eigenvalues of the underlying matrix, this would be my first guess to expl…