Predict auxiliary parameters: estimate_parameters() / "what" argument? #287

strengejacke · 2025-01-02T16:15:19Z

Fixes #286

DominiqueMakowski · 2025-01-02T16:19:55Z

But isn't it overkill to add a function for that? Isn't it better to add a dpar-like argument to estimate_predictions?

bwiernik · 2025-01-02T17:44:50Z

Where do we currently have prediction of a group-level random slope (if anywhere)?

DominiqueMakowski · 2025-01-02T17:54:16Z

estimate_grouplevel()? https://github.com/easystats/modelbased/blob/main/R/estimate_grouplevel.R

strengejacke · 2025-01-02T18:09:52Z

But isn't it overkill to add a function for that?

I have no strong opinion here, I thought it was what you had in mind (based on your issue title). We can just do nothing, because dpar is passed via dots anyway.

DominiqueMakowski · 2025-01-02T20:42:40Z

Adding a whole function might increase maintenance burden for quite a small feature
On the other hand, relying on dpar might be a bit too invisible
1. either we document dpar in estimate_prediction's kwargs (specifying that it only works when backend is marginaleffects ((which we should discuss whether to make the default)))
1. we add an argument to get_predicted

I'm leaning for option 2, but not sure about the best name. We have trend, contrast in estimate_slope and estimate_contrasts. So maybe predict="outcome" for estimate_predictions?

Thoughts @bwiernik @mattansb ?

bwiernik · 2025-01-02T20:48:52Z

How about we merge this function and grouplevel into one? I kind of like that estimate_prediction() and estimate_expectation() are specifically for outcome values vs underlying model parameters

mattansb · 2025-01-02T20:49:50Z

dpar= is documented in brms support functions for emmeans as well as in the Bayesian docs in marginaleffects so I think it would be fine here as well - better to stick to the convention.

DominiqueMakowski · 2025-01-02T21:01:23Z

How about we merge this function and grouplevel into one?

estimate_grouplevel() stays the same, it is for extraction directly the random effects. Here it's mostly about estimating predicted parameter values (including over random levels) at a datagrid (so not directly the "effects" per se).

I think it would be fine here as well - better to stick to the convention

Not a big fan of dpar (I suppose it stands for Dependent PARameter?) but yeah if it's well established no probs. Then the question is whether to expose it or simply document it in the kwargs

mattansb · 2025-01-02T21:46:55Z

It stands for Distributional PARrameter (:

strengejacke · 2025-01-02T22:51:03Z

Isn't it better to add a dpar-like argument to estimate_predictions?

Yes, we could add it to estimate_predictions() and estimate_means() - this would be my preferred choice (instead of just documenting dpar). And what about estimate_relation()? Or do we always need to calculate prediction intervals?

we add an argument to get_predicted

I think it's better to have an argument here in modelbased, and then pass dpar to get_predicted() (or marginaleffects or emmeans via estimate_means()). We need the same for estimate_contrasts() (and probably estimate_slopes()?).

So I would vote for:

a dedicated argument (other name than dpar)
add this argument to estimate_predictions() (predictions with prediction intervals), estimate_means(), estimate_slopes(), estimate_relation() (predictions with confidence intervals), and estimate_contrasts()
all these function use the new argument to set dpar, and pass dpar to the underlying predict-functions (predict/emmeans/marginaleffects).

strengejacke · 2025-01-02T23:07:17Z

which we should discuss whether to make the default

For contrasts, we can use the by argument to "filter" contrasts by a predictor's levels. To do so with the marginaleffects backend, we need to reshape the parameter-column, see also #280 (comment)

This is quite a tedious work to implement, I guess. But this is an important feature, else the returned output can become very long. Thus, I would switch to marginaleffects as default when we have implemented the functionality for the by argument in estimate_contrasts().

bwiernik · 2025-01-05T08:50:28Z

Yes, we could add it to estimate_predictions() and estimate_means() - this would be my preferred choice (instead of just documenting dpar). And what about estimate_relation()? Or do we always need to calculate prediction intervals?

This is why I think it would be better to put the parameter estimation in a separate function. Invoking dpar inherently means we are asking for confidence intervals on the distributional parameter. It's a different estimand from relation/expectation/prediction (which are all about the outcome variable).

DominiqueMakowski · 2025-01-05T10:34:41Z

It's a different estimand from relation/expectation/prediction (which are all about the outcome variable).

Although I do agree that it's an important conceptual difference, I think from a user-perspective it works to get these using the other functions. Like estimate_means(predict="sigma") it still makes "sense" to do that to obtain the "average/mean" of sigma at the different levels

I guess what I'm saying is that keeping with the currently existing API and introducing a new argument instead of a new function would retain its intuitiveness (but yes, we should then document that if predict is not "default" then CIs etc are about the distributional parameters.

strengejacke · 2025-01-05T11:08:33Z

We don't have a predict argument right now. It's used internally, depending on the function that is called. Introducing a predict argument would mean we don't need estimate_expected(), estimate_prediction() etc. anymore. That's why I would a) choose a different name for that argument, b) do nothing and document that dpar is passed to insight::get_predicted() (though that's the least user-friendly / visible option), or c) introduce a new function (like suggested in this PR).

Option a) allows us to estimate distributional parameters for estimate_means() as well (i.e. we pass dpar to emmeans or marginaleffects). For option c), we could add a backend argument again, to either call insight::get_predicted(), marginaleffects or emmeans.

DominiqueMakowski · 2025-01-05T11:19:19Z

Mmh would it conceptually make sense to add the option directly to insight::get_predicted for the predict argument to be able to take values like sigma etc.?

Because this argument already changes both the transformation + the nature of CIs etc. so it would make sense to use that to predict other distributional parameters?

strengejacke · 2025-01-05T12:14:25Z

Mmh would it conceptually make sense to add the option directly to insight::get_predicted for the argument to be able to take values like sigma etc.?

But then we need to include all possible options for dpar. Are those documented somewhere? Because we check the predict argument for valid options: https://github.com/easystats/insight/blob/e94ec60f0205ebfdbeca117d621ea1fcdca7c69b/R/get_predicted_args.R#L79-L117

DominiqueMakowski · 2025-01-05T12:50:02Z

But then we need to include all possible options for dpar

Yeah that's an issue. We could dispatch any value that is not in our main list to dpar, but the risk is that it might then leads to some obscure errors for users

One option would be to run for instance get_family() or something like that and based on this get a list of possible dpar (but it seems a bit overkill)

This would currently mostly apply to brms anyway afaik, so we could support the ones that are in brmsfamily ?

strengejacke · 2025-01-05T14:29:18Z

Ok, added to insight: easystats/insight#988

DominiqueMakowski · 2025-01-06T09:30:45Z

Now that easystats/insight#988, it should work for estimate_prediction etc., we just need to add & dispatch predict to marginaleffects' dpar in estimate_means() right?

strengejacke · 2025-01-06T10:14:26Z

Yes, we need to decide a) if we want an additional argument (and if so, which name), or, like Brenton suggested and implemented in this PR, b) have an extra function.

DominiqueMakowski · 2025-01-06T10:40:24Z

since it's now handled by predict in get_prediction, I'd stick with that. I don't t think we need a new function since this argument could exist in all existing functions (with the caveat that it would probably render the distinction estimate_relation and estimate_expectation / estimate_expectation / estimate_predicted less relevant (aside from their difference in the default newdata)

strengejacke · 2025-01-06T10:49:03Z

One counter argument would be that estimate_relation(), estimate_expectation(), estimate_expectation() and estimate_predicted() are all the same anyway, they just differ in the default value for their data argument, and internally just use different options for the predict argument. So why not adding another function (like estimate_parameters()) in order to keep the "relevance" of the existing functions?

I have no strong preference here, but if the all the existing function only differ in their default values for arguments, why not just have one integrated function?

On the other hand, to make it work with marginaleffects / emmeans, we need to pass predict to estimate_means() anyway - or we also add a "backend" argument to estimate_parameter()?

hm...

DominiqueMakowski · 2025-01-06T11:05:46Z

estimate_relation(), estimate_expectation() are indeed "aliases" of specific parameters of estimate_predicted(), but they are still relevant when one is interested in predicting other dpars.

Basically, predicting other dpars is not orthogonal to the goal of estimate_relation() / estimate_predicted() etc, you could be interested in predicting other dpars for the original data, for a datagrid, for marginal means etc. etc.

Hence conceptually I think varying what dpar is predicted fits more as an argument to existing functions rather than a separate function

strengejacke · 2025-01-06T12:57:00Z

That sounds reasonable.

strengejacke · 2025-01-06T12:59:37Z

Basically, predicting other dpars is not orthogonal to the goal of estimate_relation() / estimate_predicted() etc, you could be interested in predicting other dpars for the original data, for a datagrid, for marginal means etc. etc.

But... the estimate_() functions set the value of predict ("expectation", "link" or "prediction"). How would we combine this with predict = "sigma"? If predict = "sigma", we would pass one of the predict-options, based on the called function, and then pass dpar to insight::get_predicted()? (which means, we don't need the functionality from easystats/insight#988 , since we have to handle it in modelbased anyway).

DominiqueMakowski · 2025-01-06T13:11:40Z

estimate_prediction(predict="sigma") & estimate_expectation(predict="sigma"): data is original data

.estimate_predicted(
    model,
    data = data,
    ci = ci,
    keep_iterations = keep_iterations,
    predict = predict,
    ...
  )

estimate_link(predict="sigma") & estimate_relation(predict="sigma"): = data is grid

.estimate_predicted(
    model,
    data = "grid",
    ci = ci,
    keep_iterations = keep_iterations,
    predict = predict,
    ...
  )

strengejacke · 2025-01-06T22:41:33Z

We have to set the default for predict to NULL in those methods that rely on emmeans or marginaleffects, because we need different default values for each package, and for marginaleffects also based on the model type. The correct predict option is defined in .get_emmeans_type_argument() and .get_type_argument()

mattansb · 2025-01-07T06:29:00Z

The column name should be a mix of the parameter's name and the point estimate we're using - "nu_median", "sigma_mean", "phi_MAP", etc...

strengejacke · 2025-01-07T07:41:21Z

The column name should be a mix of the parameter's name and the point estimate we're using - "nu_median", "sigma_mean", "phi_MAP", etc...

Can we set the centrality somewhere in marginaleffects? Or do we have any information on whether Median, Mean etc. are returned?

strengejacke · 2025-01-07T08:00:36Z

Just realized that adding a new function would have been less effort. :-)
Should be working now, but we still have to polish some things (docs, add more tests, make sure we have proper outputs/column names)...

When tests pass, I would merge this PR and then open a new one for the remaining details.

strengejacke · 2025-01-07T08:02:50Z

btw, @DominiqueMakowski, should we remove the aliases model_*()? Else, we have three different names for one thing:

estimate_means()
get_marginalmeans()
model_marginalmeans()

codecov · 2025-01-07T08:03:33Z

Codecov Report

Attention: Patch coverage is 65.38462% with 45 lines in your changes missing coverage. Please review.

Project coverage is 42.15%. Comparing base (7c4f4d4) to head (ace4545).
Report is 49 commits behind head on main.

Files with missing lines	Patch %	Lines
R/visualisation_recipe.estimate_grouplevel.R	0.00%	14 Missing ⚠️
R/get_marginaleffects_type.R	56.00%	11 Missing ⚠️
R/clean_names.R	37.50%	5 Missing ⚠️
R/get_emmeans.R	81.81%	4 Missing ⚠️
R/estimate_contrasts.R	80.00%	2 Missing ⚠️
R/estimate_means.R	87.50%	2 Missing ⚠️
R/get_emcontrasts.R	66.66%	2 Missing ⚠️
R/get_marginalmeans.R	88.88%	2 Missing ⚠️
R/estimate_slopes.R	0.00%	1 Missing ⚠️
R/smoothing.R	0.00%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #287      +/-   ##
==========================================
+ Coverage   37.22%   42.15%   +4.93%     
==========================================
  Files          27       27              
  Lines        1209     1383     +174     
==========================================
+ Hits          450      583     +133     
- Misses        759      800      +41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mattansb · 2025-01-07T08:04:21Z

Can we set the centrality somewhere in marginaleffects? Or do we have any information on whether Median, Mean etc. are returned?

Not sure about marginaleffects, but bayestestR fully supports outputs now, so we can control that ourselves...

strengejacke · 2025-01-07T08:17:12Z

So I would pass the result from avg_predictions() to describe_posterior()?

DominiqueMakowski · 2025-01-07T08:19:41Z

WDYT, should the column be named "Mean" (formerly) or like the dpar, in this case "Sigma"?

"Sigma"

Not sure about adding the point estimate, not a fan of it aesthetically and we don't do it elsewhere (i.E., we still write "Mean" fro marginal means when the point estimate is median)

DominiqueMakowski · 2025-01-07T08:21:38Z

should we remove the aliases model_*()?

The get_* ones do something else, they are the wrappers around marginaleffects/emmeans that the estimate_* functions use. But yes, we should remove the model_* for sure

mattansb · 2025-01-07T08:29:31Z

So I would pass the result from avg_predictions() to describe_posterior()?

Yes, I think so.

R/clean_names.R

strengejacke · 2025-01-07T09:30:00Z

@DominiqueMakowski I think for version 1.0.0, most/all of these should be fixed, if possible. I see most issues with plotting, would be nice if we can fix plotting, as this seems to be an essential feature to me:
https://github.com/easystats/modelbased/issues?q=is%3Aopen+is%3Aissue+milestone%3A1.0.0

And we should definitely fix the printing for contrasts using marginaleffects, else we cannot filter using by:

modelbased/R/estimate_contrasts.R

Lines 188 to 236 in 26a4620

    
           .format_marginaleffects_contrasts <- function(model, estimated, p_adjust, method, ...) { 
        
             predict <- attributes(estimated)$predict 
        
             groups <- attributes(estimated)$by 
        
             contrast <- attributes(estimated)$contrast 
        
             focal_terms <- attributes(estimated)$focal_terms 
        
             estimated <- .p_adjust(model, estimated, p_adjust, ...) 
        
             valid_methods <- c( 
        
               "pairwise", "reference", "sequential", "meandev", "meanotherdev", 
        
               "revpairwise", "revreference", "revsequential" 
        
             ) 
        
             if (!is.null(method) && is.character(method) && method %in% valid_methods) { 
        
               ## TODO: split Parameter column into levels indicated in "contrast", and filter by "by" 
        
               # These are examples of what {marginaleffects} returns, a single parmater 
        
               # column that includes all levels, comma- and dash-separated, or with / 
        
               # see also https://github.com/easystats/modelbased/pull/280 
        
               # 
        
               #   estimate_contrasts(m, c("time", "coffee"), backend = "marginaleffects", p_adjust = "none") 
        
               # #> Marginal Contrasts Analysis 
        
               # #> 
        
               # #> Parameter                              | Difference |          95% CI |      p 
        
               # #> ------------------------------------------------------------------------------ 
        
               # #> morning, coffee - morning, control     |       5.78 | [  1.83,  9.73] | 0.004 
        
               # #> morning, coffee - noon, coffee         |       1.93 | [ -2.02,  5.88] | 0.336 
        
               # 
        
               # estimate_contrasts( 
        
               #   m, 
        
               #   c("time", "coffee"), 
        
               #   backend = "marginaleffects", 
        
               #   p_adjust = "none", 
        
               #   method = ratio ~ reference | coffee 
        
               # ) 
        
               # #> Marginal Contrasts Analysis 
        
               # #> 
        
               # #> coffee  |              hypothesis | Difference |       95% CI |      p 
        
               # #> ---------------------------------------------------------------------- 
        
               # #> coffee  |      (noon) / (morning) |       0.89 | [0.67, 1.11] | < .001 
        
               # #> coffee  | (afternoon) / (morning) |       1.11 | [0.87, 1.36] | < .001 
        
               # 
        
               # We need to split the "Parameter" or "hypothesis" columns into one column 
        
               # per level, as we do with the emmeans-backend. Else, we cannot use the "by" 
        
               # argument, which is used for filtering by levels of given focal terms. 
        
             } 
        
             estimated 
        
           }

Predict auxiliary parameters: estimate_parameters() / "what" argument?

d813e72

Fixes #286

strengejacke mentioned this pull request Jan 2, 2025

Predict auxiliary parameters: estimate_parameters() / "what" argument? #286

Closed

desc

b79bb60

strengejacke mentioned this pull request Jan 5, 2025

get_predicted() supports dpar easystats/insight#988

Merged

fix

4837959

strengejacke added 3 commits January 6, 2025 23:56

fix

dc88c2b

fix

c5cdb28

please lintr

d3c0999

strengejacke added 2 commits January 7, 2025 08:31

fix, docs

6d51bf5

docs

27ebfa7

docs

ace4545

add tests

f43119b

remove aliases, add news

493afad

DominiqueMakowski reviewed Jan 7, 2025

View reviewed changes

R/clean_names.R Outdated Show resolved Hide resolved

strengejacke added 5 commits January 7, 2025 09:44

use toTitleCase

0194dea

add tools

069dfb6

fix

4e2c7ae

fix

b6fe5eb

fix

26a4620

strengejacke merged commit 7d18b59 into main Jan 7, 2025
18 of 22 checks passed

strengejacke deleted the strengejacke/issue286 branch January 7, 2025 10:53

strengejacke mentioned this pull request Jan 7, 2025

Release 1.0.0 #289

Open

Predict auxiliary parameters: estimate_parameters() / "what" argument? #287

Predict auxiliary parameters: estimate_parameters() / "what" argument? #287

Conversation

strengejacke commented Jan 2, 2025

DominiqueMakowski commented Jan 2, 2025

bwiernik commented Jan 2, 2025

DominiqueMakowski commented Jan 2, 2025

strengejacke commented Jan 2, 2025

DominiqueMakowski commented Jan 2, 2025

bwiernik commented Jan 2, 2025

mattansb commented Jan 2, 2025 • edited Loading

DominiqueMakowski commented Jan 2, 2025

mattansb commented Jan 2, 2025

strengejacke commented Jan 2, 2025 • edited Loading

strengejacke commented Jan 2, 2025 • edited Loading

bwiernik commented Jan 5, 2025

DominiqueMakowski commented Jan 5, 2025

strengejacke commented Jan 5, 2025 • edited Loading

DominiqueMakowski commented Jan 5, 2025

strengejacke commented Jan 5, 2025

DominiqueMakowski commented Jan 5, 2025

strengejacke commented Jan 5, 2025

DominiqueMakowski commented Jan 6, 2025

strengejacke commented Jan 6, 2025

DominiqueMakowski commented Jan 6, 2025

strengejacke commented Jan 6, 2025

DominiqueMakowski commented Jan 6, 2025

strengejacke commented Jan 6, 2025

strengejacke commented Jan 6, 2025 • edited Loading

DominiqueMakowski commented Jan 6, 2025 • edited Loading

strengejacke commented Jan 6, 2025

mattansb commented Jan 7, 2025

strengejacke commented Jan 7, 2025

strengejacke commented Jan 7, 2025

strengejacke commented Jan 7, 2025

codecov bot commented Jan 7, 2025

Codecov Report

mattansb commented Jan 7, 2025

strengejacke commented Jan 7, 2025

DominiqueMakowski commented Jan 7, 2025 • edited Loading

DominiqueMakowski commented Jan 7, 2025

mattansb commented Jan 7, 2025

strengejacke commented Jan 7, 2025

mattansb commented Jan 2, 2025 •

edited

Loading

strengejacke commented Jan 2, 2025 •

edited

Loading

strengejacke commented Jan 2, 2025 •

edited

Loading

strengejacke commented Jan 5, 2025 •

edited

Loading

strengejacke commented Jan 6, 2025 •

edited

Loading

DominiqueMakowski commented Jan 6, 2025 •

edited

Loading

DominiqueMakowski commented Jan 7, 2025 •

edited

Loading