Revise compare_models() for Bayesian models #716

DominiqueMakowski · 2024-04-23T16:18:00Z

The current output would benefit from some streamlining.

> performance::compare_performance(model, model2, model3)
# Comparison of Model Performance Indices

Name   |   Model |     ELPD | ELPD_SE | LOOIC (weights) | LOOIC_SE | WAIC (weights) |    R2 | R2 (adj.) |  RMSE | Sigma
-----------------------------------------------------------------------------------------------------------------------
model  | brmsfit | -104.387 |   9.532 |   208.8 (0.092) |   19.063 |  208.8 (<.001) | 0.927 |     0.926 | 0.475 | 0.487
model2 | brmsfit |  -89.212 |  10.543 |   178.4 (<.001) |   21.085 |  178.4 (<.001) | 0.941 |     0.940 | 0.425 | 0.467
model3 | brmsfit |  -64.580 |  11.569 |   129.2 (0.908) |   23.137 |  129.1 (>.999) | 0.958 |     0.957 | 0.353 | 0.363

Suggestions:

Remove RMSE and Sigma by default (not exactly typically useful or what you would expect when comparing models)
Move R2 and R2 adj. first, after the model column
My current understanding is that LOO and WAIC are both methods to estimate ELPD, which can be used to compare models (in particular, by looking at the ELPD-diff (see [Feature] Add report.compare.loo report#419 for the report support recently added). As such, in the interest of computation, I would display directly the ELPD_DIFF + (SE), by default computed using LOO (although we could add the option for WAIC for faster computation), and that's it, drop the LOOIC, WAIC and the raw ELPD which are redundant indices and simply add noise.

What do you think?

bwiernik · 2024-04-23T16:24:54Z

I agree other than that I think we should keep LOOIC+SE (rather than ELPD). LOOIC is -2*ELPD, which puts it in the deviance metric that is also used in frequentist models with AIC, etc, and this allows us to show the stacking weights with our existing formatting functions

DominiqueMakowski · 2024-04-23T16:31:27Z

Fair
So we'd go with ELPD_DIFF (SE); and LOOIC (weights) +SE? Or just the latter? is the SE for LOOIC super necessary since we have the weights?

Should we also add some details in the documentation about how to interpret the weights, because I've been asked by students and I wasn't too clear in my answer... any rules of thumb?

DominiqueMakowski · 2024-04-23T16:51:08Z

Moreover, in easystats/report#419, we compute the z-transformed ELPD diff (i.e., normalized by SE). But I think we might as well directly convert it to a p-value given that its asymptotic distribution, it would be the most easy and "intuitive" to read (rather than the z-value). In order to avoid duplicated code, I suggest we add a parameters::parameters() method for brms::loo_compare() outputs that would simply format and compute the columns we need, we could then use that in report as well as here. I'll open a PR

EDIT: forget what I said about the duplicated code it wouldn't work

DominiqueMakowski · 2024-04-23T20:54:35Z

Here is a skeleton, maybe it would make more sense to add it as a method of test_performance() rather than compare_performance() (?)

maybe test_performance() for Bayesian models could be a watered-down version (mostly focusing on the diff) of compare_performance (which would return the IC values + R2 etc. etc.)

.test_performance_bayesian <- function(objects, criterion="loo") {

  # Compute differences ----
  if (criterion %in% c("loo", "looic")) {
    diff <- brms::loo_compare(lapply(objects, loo::loo))
    col <- "looic"
  } else if (criterion == "waic") {
    diff <- brms::loo_compare(lapply(objects, loo::waic))
    col <- "waic"
  } else {
    stop("Criterion not supported")
  }

  # Format ----
  out <- as.data.frame(diff)
  out$Name <- rownames(out)
  # Select columns
  out <- out[, c("Name", col, "elpd_diff", "se_diff")]

  # Compute p-value ----
  # ELPD difference is asympatotically normal, hence the z-score is computed
  z <- out$elpd_diff[-1] / out$se_diff[-1]
  out$p <- c(NA, 2 * pnorm(-abs(z)))

  # Return
  out
}


m1 <- brms::brm(wt ~ mpg, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
m2 <- brms::brm(wt ~ mpg + hp, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
m3 <- brms::brm(wt ~ mpg + hp + qsec, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
objects <- insight::ellipsis_info(m1, m2, m3, only_models = TRUE)

.test_performance_bayesian(objects)
#> Warning: Found 1 observations with a pareto_k > 0.7 in model 'X[[i]]'. We
#> recommend to set 'moment_match = TRUE' in order to perform moment matching for
#> problematic observations.

#> Warning: Found 1 observations with a pareto_k > 0.7 in model 'X[[i]]'. We
#> recommend to set 'moment_match = TRUE' in order to perform moment matching for
#> problematic observations.
#>    Name    looic elpd_diff  se_diff          p
#> m3   m3 45.99388  0.000000 0.000000         NA
#> m1   m1 50.78136 -2.393738 2.318327 0.30182476
#> m2   m2 53.35469 -3.680406 2.208913 0.09568128

^{Created on 2024-04-23 with reprex v2.0.2}

bwiernik · 2024-04-23T22:43:57Z

I suggest we do LOOIC (with weights) and SE. If folks want ELPD or WAIC, they can request those instead.

The SE is useful in the same way as for ELPD, whereas the weights are more like a transformed point estimate.

DominiqueMakowski changed the title ~~Reviser compare_models() for Bayesian models~~ Revise compare_models() for Bayesian models Apr 23, 2024

DominiqueMakowski mentioned this issue Apr 23, 2024

ellipisis_info(): detect if all models are Bayesian easystats/insight#866

Merged

DominiqueMakowski mentioned this issue May 1, 2024

report.compare.loo: minor edits easystats/report#427

Merged

strengejacke added Enhancement 💥 Implemented features can be improved or revised Consistency 🍏 🍎 Expected output across functions could be more similar labels May 20, 2024

roaldarbol mentioned this issue Jul 1, 2024

Comparing frequentist and Bayesian models #738

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise compare_models() for Bayesian models #716

Revise compare_models() for Bayesian models #716

DominiqueMakowski commented Apr 23, 2024

bwiernik commented Apr 23, 2024

DominiqueMakowski commented Apr 23, 2024

DominiqueMakowski commented Apr 23, 2024 •

edited

Loading

DominiqueMakowski commented Apr 23, 2024

bwiernik commented Apr 23, 2024

Revise compare_models() for Bayesian models #716

Revise compare_models() for Bayesian models #716

Comments

DominiqueMakowski commented Apr 23, 2024

bwiernik commented Apr 23, 2024

DominiqueMakowski commented Apr 23, 2024

DominiqueMakowski commented Apr 23, 2024 • edited Loading

DominiqueMakowski commented Apr 23, 2024

bwiernik commented Apr 23, 2024

DominiqueMakowski commented Apr 23, 2024 •

edited

Loading