How to calculate likelihoods? (sklearn reproduction) #1661

bcolloran · 2021-06-17T18:52:24Z

bcolloran
Jun 17, 2021

I'm sure I must be missing something obvious, but I'm having a heck of time computing likelihoods that reproduce the values I get from sklearn. The notebook available here shows a model implemented in sklearn and gpytorch that produces predictions in near perfect agreement, but I'm not able to compute a log likelihood from gpytorch that matches what I get from sklearn.

To verify that I'm not totally off base, I went back to Rasmussen and Williams Eq 5.8 to compute the log likelihood long hand, and using the kernel matrix accessible from sklearn, I get the same value as sklearn's built-in method for computing that value.

However, when I try to do the same calculation using the gpytorch kernel matrix accessed from my model using model.covar_module(X_torch).cpu().detach().cholesky().numpy() (see notebook), I get a wildly different value for the log likelihood -- sklearn: -6639.7, gpytorch: -6260531475.7.

(I'm sure there must be a way to calculate the log likelihood idiomatically without resorting to the textbook formula, but I haven't been able to to figure that out, and in any case, I'd love to be able to reproduce the calculation via the textbook formula just to make sure I understand how gpytorch works).

I'm also puzzled by the difference between the kernel matrices produced by gpytorch and sklearn. They are strikingly similar (differences are close to zero for the vast majority of entries), and they seem to produce nearly identical predictions, but there is a surprising amount of structure in the differences (sharp banding). Any ideas what might be going on there?

Apologies if I've made some basic mistake or have misunderstood something fundamental. Any help and/or insight would be greatly appreciated! Thanks!

Answered by jacobrgardner

Jun 21, 2021

Looking at your notebook, the first thing I notice is that you're computing the log likelihood using K = model.covar_module(X_torch) as the covariance matrix. Note that this doesn't include the likelihood noise -- e.g., it's not K + \sigma^2 I. The + \sigma^2 I is actually extremely important, even when \sigma is small. This is because kernel matrices can easily have smallest eigenvalues on the order of like 1e-30, so even when \sigma is like 0.01, adding \sigma^2 I can increase the smallest eigenvalue of K by like 25 orders of magnitude.

Given how both the log determinant and the linear solve K^{-1}y relate to the eigenvalues of the underlying matrix, this would be my first guess to expl…

View full answer

jacobrgardner · 2021-06-21T00:29:09Z

jacobrgardner
Jun 21, 2021
Maintainer

Looking at your notebook, the first thing I notice is that you're computing the log likelihood using K = model.covar_module(X_torch) as the covariance matrix. Note that this doesn't include the likelihood noise -- e.g., it's not K + \sigma^2 I. The + \sigma^2 I is actually extremely important, even when \sigma is small. This is because kernel matrices can easily have smallest eigenvalues on the order of like 1e-30, so even when \sigma is like 0.01, adding \sigma^2 I can increase the smallest eigenvalue of K by like 25 orders of magnitude.

Given how both the log determinant and the linear solve K^{-1}y relate to the eigenvalues of the underlying matrix, this would be my first guess to explain the ridiculous gap between the two values.

1 reply

bcolloran Jun 22, 2021
Author

Thanks @jacobrgardner, that was indeed the issue. The help is most appreciated -- as expected, kind of obvious in retrospect... but I have no idea how much more searching I might have needed to do before finding that detail myself :-)

In case it may be helpful to anyone coming across this discussion in the future, here is an updated notebook with the reproducing calculations:
https://gist.github.com/bcolloran/7686b63a5544295f0382415e553194ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to calculate likelihoods? (sklearn reproduction) #1661

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to calculate likelihoods? (sklearn reproduction) #1661

bcolloran Jun 17, 2021

Replies: 1 comment · 1 reply

jacobrgardner Jun 21, 2021 Maintainer

bcolloran Jun 22, 2021 Author

bcolloran
Jun 17, 2021

Replies: 1 comment 1 reply

jacobrgardner
Jun 21, 2021
Maintainer

bcolloran Jun 22, 2021
Author