Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict auxiliary parameters: estimate_parameters() / "what" argument? #287

Merged
merged 33 commits into from
Jan 7, 2025

Conversation

strengejacke
Copy link
Member

Fixes #286

@DominiqueMakowski
Copy link
Member

But isn't it overkill to add a function for that? Isn't it better to add a dpar-like argument to estimate_predictions?

@bwiernik
Copy link
Contributor

bwiernik commented Jan 2, 2025

Where do we currently have prediction of a group-level random slope (if anywhere)?

@DominiqueMakowski
Copy link
Member

@strengejacke
Copy link
Member Author

But isn't it overkill to add a function for that?

I have no strong opinion here, I thought it was what you had in mind (based on your issue title). We can just do nothing, because dpar is passed via dots anyway.

@DominiqueMakowski
Copy link
Member

  • Adding a whole function might increase maintenance burden for quite a small feature

  • On the other hand, relying on dpar might be a bit too invisible

    1. either we document dpar in estimate_prediction's kwargs (specifying that it only works when backend is marginaleffects ((which we should discuss whether to make the default)))
    1. we add an argument to get_predicted

I'm leaning for option 2, but not sure about the best name. We have trend, contrast in estimate_slope and estimate_contrasts. So maybe predict="outcome" for estimate_predictions?

Thoughts @bwiernik @mattansb ?

@bwiernik
Copy link
Contributor

bwiernik commented Jan 2, 2025

How about we merge this function and grouplevel into one? I kind of like that estimate_prediction() and estimate_expectation() are specifically for outcome values vs underlying model parameters

@mattansb
Copy link
Member

mattansb commented Jan 2, 2025

dpar= is documented in brms support functions for emmeans as well as in the Bayesian docs in marginaleffects so I think it would be fine here as well - better to stick to the convention.

@DominiqueMakowski
Copy link
Member

How about we merge this function and grouplevel into one?

estimate_grouplevel() stays the same, it is for extraction directly the random effects. Here it's mostly about estimating predicted parameter values (including over random levels) at a datagrid (so not directly the "effects" per se).

I think it would be fine here as well - better to stick to the convention

Not a big fan of dpar (I suppose it stands for Dependent PARameter?) but yeah if it's well established no probs. Then the question is whether to expose it or simply document it in the kwargs

@mattansb
Copy link
Member

mattansb commented Jan 2, 2025

It stands for Distributional PARrameter (:

@strengejacke
Copy link
Member Author

strengejacke commented Jan 2, 2025

Isn't it better to add a dpar-like argument to estimate_predictions?

Yes, we could add it to estimate_predictions() and estimate_means() - this would be my preferred choice (instead of just documenting dpar). And what about estimate_relation()? Or do we always need to calculate prediction intervals?

we add an argument to get_predicted

I think it's better to have an argument here in modelbased, and then pass dpar to get_predicted() (or marginaleffects or emmeans via estimate_means()). We need the same for estimate_contrasts() (and probably estimate_slopes()?).

So I would vote for:

  • a dedicated argument (other name than dpar)
  • add this argument to estimate_predictions() (predictions with prediction intervals), estimate_means(), estimate_slopes(), estimate_relation() (predictions with confidence intervals), and estimate_contrasts()
  • all these function use the new argument to set dpar, and pass dpar to the underlying predict-functions (predict/emmeans/marginaleffects).

@strengejacke
Copy link
Member Author

strengejacke commented Jan 2, 2025

  1. which we should discuss whether to make the default

For contrasts, we can use the by argument to "filter" contrasts by a predictor's levels. To do so with the marginaleffects backend, we need to reshape the parameter-column, see also #280 (comment)

This is quite a tedious work to implement, I guess. But this is an important feature, else the returned output can become very long. Thus, I would switch to marginaleffects as default when we have implemented the functionality for the by argument in estimate_contrasts().

@bwiernik
Copy link
Contributor

bwiernik commented Jan 5, 2025

Yes, we could add it to estimate_predictions() and estimate_means() - this would be my preferred choice (instead of just documenting dpar). And what about estimate_relation()? Or do we always need to calculate prediction intervals?

This is why I think it would be better to put the parameter estimation in a separate function. Invoking dpar inherently means we are asking for confidence intervals on the distributional parameter. It's a different estimand from relation/expectation/prediction (which are all about the outcome variable).

@DominiqueMakowski
Copy link
Member

It's a different estimand from relation/expectation/prediction (which are all about the outcome variable).

Although I do agree that it's an important conceptual difference, I think from a user-perspective it works to get these using the other functions. Like estimate_means(predict="sigma") it still makes "sense" to do that to obtain the "average/mean" of sigma at the different levels

I guess what I'm saying is that keeping with the currently existing API and introducing a new argument instead of a new function would retain its intuitiveness (but yes, we should then document that if predict is not "default" then CIs etc are about the distributional parameters.

@strengejacke
Copy link
Member Author

strengejacke commented Jan 5, 2025

We don't have a predict argument right now. It's used internally, depending on the function that is called. Introducing a predict argument would mean we don't need estimate_expected(), estimate_prediction() etc. anymore. That's why I would a) choose a different name for that argument, b) do nothing and document that dpar is passed to insight::get_predicted() (though that's the least user-friendly / visible option), or c) introduce a new function (like suggested in this PR).

Option a) allows us to estimate distributional parameters for estimate_means() as well (i.e. we pass dpar to emmeans or marginaleffects). For option c), we could add a backend argument again, to either call insight::get_predicted(), marginaleffects or emmeans.

@DominiqueMakowski
Copy link
Member

Mmh would it conceptually make sense to add the option directly to insight::get_predicted for the predict argument to be able to take values like sigma etc.?

Because this argument already changes both the transformation + the nature of CIs etc. so it would make sense to use that to predict other distributional parameters?

@strengejacke
Copy link
Member Author

Mmh would it conceptually make sense to add the option directly to insight::get_predicted for the argument to be able to take values like sigma etc.?

But then we need to include all possible options for dpar. Are those documented somewhere? Because we check the predict argument for valid options: https://github.com/easystats/insight/blob/e94ec60f0205ebfdbeca117d621ea1fcdca7c69b/R/get_predicted_args.R#L79-L117

@DominiqueMakowski
Copy link
Member

But then we need to include all possible options for dpar

Yeah that's an issue. We could dispatch any value that is not in our main list to dpar, but the risk is that it might then leads to some obscure errors for users

One option would be to run for instance get_family() or something like that and based on this get a list of possible dpar (but it seems a bit overkill)

This would currently mostly apply to brms anyway afaik, so we could support the ones that are in brmsfamily ?

image

@strengejacke
Copy link
Member Author

Ok, added to insight: easystats/insight#988

@DominiqueMakowski
Copy link
Member

Now that easystats/insight#988, it should work for estimate_prediction etc., we just need to add & dispatch predict to marginaleffects' dpar in estimate_means() right?

@strengejacke
Copy link
Member Author

Yes, we need to decide a) if we want an additional argument (and if so, which name), or, like Brenton suggested and implemented in this PR, b) have an extra function.

@DominiqueMakowski
Copy link
Member

since it's now handled by predict in get_prediction, I'd stick with that. I don't t think we need a new function since this argument could exist in all existing functions (with the caveat that it would probably render the distinction estimate_relation and estimate_expectation / estimate_expectation / estimate_predicted less relevant (aside from their difference in the default newdata)

@strengejacke
Copy link
Member Author

One counter argument would be that estimate_relation(), estimate_expectation(), estimate_expectation() and estimate_predicted() are all the same anyway, they just differ in the default value for their data argument, and internally just use different options for the predict argument. So why not adding another function (like estimate_parameters()) in order to keep the "relevance" of the existing functions?

I have no strong preference here, but if the all the existing function only differ in their default values for arguments, why not just have one integrated function?

On the other hand, to make it work with marginaleffects / emmeans, we need to pass predict to estimate_means() anyway - or we also add a "backend" argument to estimate_parameter()?

hm...

@DominiqueMakowski
Copy link
Member

estimate_relation(), estimate_expectation() are indeed "aliases" of specific parameters of estimate_predicted(), but they are still relevant when one is interested in predicting other dpars.

Basically, predicting other dpars is not orthogonal to the goal of estimate_relation() / estimate_predicted() etc, you could be interested in predicting other dpars for the original data, for a datagrid, for marginal means etc. etc.

Hence conceptually I think varying what dpar is predicted fits more as an argument to existing functions rather than a separate function

@strengejacke
Copy link
Member Author

That sounds reasonable.

@strengejacke
Copy link
Member Author

strengejacke commented Jan 6, 2025

Basically, predicting other dpars is not orthogonal to the goal of estimate_relation() / estimate_predicted() etc, you could be interested in predicting other dpars for the original data, for a datagrid, for marginal means etc. etc.

But... the estimate_() functions set the value of predict ("expectation", "link" or "prediction"). How would we combine this with predict = "sigma"? If predict = "sigma", we would pass one of the predict-options, based on the called function, and then pass dpar to insight::get_predicted()? (which means, we don't need the functionality from easystats/insight#988 , since we have to handle it in modelbased anyway).

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Jan 6, 2025

  • estimate_prediction(predict="sigma") & estimate_expectation(predict="sigma"): data is original data
.estimate_predicted(
    model,
    data = data,
    ci = ci,
    keep_iterations = keep_iterations,
    predict = predict,
    ...
  )
  • estimate_link(predict="sigma") & estimate_relation(predict="sigma"): = data is grid
.estimate_predicted(
    model,
    data = "grid",
    ci = ci,
    keep_iterations = keep_iterations,
    predict = predict,
    ...
  )

@strengejacke
Copy link
Member Author

We have to set the default for predict to NULL in those methods that rely on emmeans or marginaleffects, because we need different default values for each package, and for marginaleffects also based on the model type. The correct predict option is defined in .get_emmeans_type_argument() and .get_type_argument()

@mattansb
Copy link
Member

mattansb commented Jan 7, 2025

The column name should be a mix of the parameter's name and the point estimate we're using - "nu_median", "sigma_mean", "phi_MAP", etc...

@strengejacke
Copy link
Member Author

The column name should be a mix of the parameter's name and the point estimate we're using - "nu_median", "sigma_mean", "phi_MAP", etc...

Can we set the centrality somewhere in marginaleffects? Or do we have any information on whether Median, Mean etc. are returned?

@strengejacke
Copy link
Member Author

Just realized that adding a new function would have been less effort. :-)
Should be working now, but we still have to polish some things (docs, add more tests, make sure we have proper outputs/column names)...

When tests pass, I would merge this PR and then open a new one for the remaining details.

@strengejacke
Copy link
Member Author

btw, @DominiqueMakowski, should we remove the aliases model_*()? Else, we have three different names for one thing:

estimate_means()
get_marginalmeans()
model_marginalmeans()

Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 65.38462% with 45 lines in your changes missing coverage. Please review.

Project coverage is 42.15%. Comparing base (7c4f4d4) to head (ace4545).
Report is 49 commits behind head on main.

Files with missing lines Patch % Lines
R/visualisation_recipe.estimate_grouplevel.R 0.00% 14 Missing ⚠️
R/get_marginaleffects_type.R 56.00% 11 Missing ⚠️
R/clean_names.R 37.50% 5 Missing ⚠️
R/get_emmeans.R 81.81% 4 Missing ⚠️
R/estimate_contrasts.R 80.00% 2 Missing ⚠️
R/estimate_means.R 87.50% 2 Missing ⚠️
R/get_emcontrasts.R 66.66% 2 Missing ⚠️
R/get_marginalmeans.R 88.88% 2 Missing ⚠️
R/estimate_slopes.R 0.00% 1 Missing ⚠️
R/smoothing.R 0.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #287      +/-   ##
==========================================
+ Coverage   37.22%   42.15%   +4.93%     
==========================================
  Files          27       27              
  Lines        1209     1383     +174     
==========================================
+ Hits          450      583     +133     
- Misses        759      800      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattansb
Copy link
Member

mattansb commented Jan 7, 2025

Can we set the centrality somewhere in marginaleffects? Or do we have any information on whether Median, Mean etc. are returned?

Not sure about marginaleffects, but bayestestR fully supports outputs now, so we can control that ourselves...

@strengejacke
Copy link
Member Author

So I would pass the result from avg_predictions() to describe_posterior()?

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Jan 7, 2025

WDYT, should the column be named "Mean" (formerly) or like the dpar, in this case "Sigma"?

"Sigma"

Not sure about adding the point estimate, not a fan of it aesthetically and we don't do it elsewhere (i.E., we still write "Mean" fro marginal means when the point estimate is median)

@DominiqueMakowski
Copy link
Member

should we remove the aliases model_*()?

The get_* ones do something else, they are the wrappers around marginaleffects/emmeans that the estimate_* functions use. But yes, we should remove the model_* for sure

@mattansb
Copy link
Member

mattansb commented Jan 7, 2025

So I would pass the result from avg_predictions() to describe_posterior()?

Yes, I think so.

R/clean_names.R Outdated Show resolved Hide resolved
@strengejacke
Copy link
Member Author

@DominiqueMakowski I think for version 1.0.0, most/all of these should be fixed, if possible. I see most issues with plotting, would be nice if we can fix plotting, as this seems to be an essential feature to me:
https://github.com/easystats/modelbased/issues?q=is%3Aopen+is%3Aissue+milestone%3A1.0.0

And we should definitely fix the printing for contrasts using marginaleffects, else we cannot filter using by:

.format_marginaleffects_contrasts <- function(model, estimated, p_adjust, method, ...) {
predict <- attributes(estimated)$predict
groups <- attributes(estimated)$by
contrast <- attributes(estimated)$contrast
focal_terms <- attributes(estimated)$focal_terms
estimated <- .p_adjust(model, estimated, p_adjust, ...)
valid_methods <- c(
"pairwise", "reference", "sequential", "meandev", "meanotherdev",
"revpairwise", "revreference", "revsequential"
)
if (!is.null(method) && is.character(method) && method %in% valid_methods) {
## TODO: split Parameter column into levels indicated in "contrast", and filter by "by"
# These are examples of what {marginaleffects} returns, a single parmater
# column that includes all levels, comma- and dash-separated, or with /
# see also https://github.com/easystats/modelbased/pull/280
#
# estimate_contrasts(m, c("time", "coffee"), backend = "marginaleffects", p_adjust = "none")
# #> Marginal Contrasts Analysis
# #>
# #> Parameter | Difference | 95% CI | p
# #> ------------------------------------------------------------------------------
# #> morning, coffee - morning, control | 5.78 | [ 1.83, 9.73] | 0.004
# #> morning, coffee - noon, coffee | 1.93 | [ -2.02, 5.88] | 0.336
#
# estimate_contrasts(
# m,
# c("time", "coffee"),
# backend = "marginaleffects",
# p_adjust = "none",
# method = ratio ~ reference | coffee
# )
# #> Marginal Contrasts Analysis
# #>
# #> coffee | hypothesis | Difference | 95% CI | p
# #> ----------------------------------------------------------------------
# #> coffee | (noon) / (morning) | 0.89 | [0.67, 1.11] | < .001
# #> coffee | (afternoon) / (morning) | 1.11 | [0.87, 1.36] | < .001
#
# We need to split the "Parameter" or "hypothesis" columns into one column
# per level, as we do with the emmeans-backend. Else, we cannot use the "by"
# argument, which is used for filtering by levels of given focal terms.
}
estimated
}

@strengejacke strengejacke merged commit 7d18b59 into main Jan 7, 2025
18 of 22 checks passed
@strengejacke strengejacke deleted the strengejacke/issue286 branch January 7, 2025 10:53
@strengejacke strengejacke mentioned this pull request Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Predict auxiliary parameters: estimate_parameters() / "what" argument?
4 participants