-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cook's distance or DFFIT, DFBETA, leverage #368
Comments
Thanks @Generalized , nice idea, to get started, could you post here a minimal example of dfbeta and Cook's distance using gls? Then we can look into this later. |
could {dharma} be used to calculate all these metrics? iirc there was a plan to generate methods for {mmrm} with {dharma}. |
@yonicd good idea, not sure if it is possible with |
for reference here is the vignette for {dharma} |
another possible suite of packages that have model diagnostics builtin would be the easystats. They have a very comprehensive and user friendly suite of modeling packages that include diagnostics and post processing. They already added an mmrm method to ingest the model output to their infrastructure. |
I see this proposal was already made :) |
Data:
|
Thanks a lot @Generalized for the prototypes! That is very helpful. I think we can definitely look into adding the diagnostic measures into the package. Note that we'll also soon publish |
I understand the reasons and yes, I agree. People often prefer own formatting. I've practically never used any defaults, always tweaking it for my preferences (even if it contradicted journals needs :D) But at least please consider providing the numbers alone (ACF, Cook's, DFBETA (well complements Cook's), maybe something else you consider important), maybe as respective slots in the fitted model, or as a stand-alone dedicated functions so everyone can plot them on their own. This will make 90% of the work already done and validated. Regarding additional package, I already use over 270 packages at work, so I'm very resistant to add more ones to my tool stack. Especially that I utilize own written engine for rendering tables (via Officeverse) in the formats our Sponsors prefer (it differs a bit from what's typically offered, uses decimal tabulations, specific formatting) and logging (via RMarkdown), fitting our workflow. / for some exploratory studies I used 50+ packages in a single trial: multiple endpoints, numerous different-kind analyses, sensitivity analyses employing different models/estimation methods for comparison, imputations; will write a blog post about that. / But, anyway, your idea to move extras to an auxiliary package makes lots of sense. From one perspective, it makes the core package lightweight, free from unnecessary dependencies, which facilitates further development and maintenance. From the other side, organizing things in a family of related packages ensures internal consistency and allows to experiment with stuff. I observe it already elsewhere, like the Officeverse, Tidyverse or easystats ecosystems. Only please make it easily installable from CRAN, because in the past, as far as I remember, it required some GitHub work requiring GitHub tokens, which is problematic when implementing automatically generated environments. |
Hi!
Would you consider adding some diagnostics to mmrm? Sure, it can be implemented manually, residuals can be QQ-plotted, but it would be nice some ready-to-go tools for the assessment of the impact of high-residual, high-leverage and combined observations.
Currently I can reproduce the MMRM with nlme::gls() and use predictmeans::CookD(model)
https://rdrr.io/cran/predictmeans/
It's also possible to run OLS and ignore the dependency in data, then check the residuals via olsrr package or use the low-level internal base R functions (dfbetas, dffits, cooks.distance). In many scenarios this may suffice, but complicates the whole process. And, because clusters are now ignored, the threshold for Cook's distance changes (4/n) between GLS and OLS.
For the purpose of inference (Wald's mostly) I'd much prefer DFBETAs. or at least Cook's d (combining leverage and residual distance).
Currently I can get them for the OLS-fit model. But since the within-subject correlation can affect the estimated model coefficients through the GLS (or GEE) estimation, at the end of the day there may be a discrepancy between OLS and GLS. OK, I don't care about the actual estimates, only about the influence, so highly influential observations will "manifest" themselves in both cases, but still I'd prefer to have it on-board for the gls/mmrm.
Sometimes I also replace GLS with GEE and use the methods available for GEE: dfbeta.glmgee
https://github.com/cran/glmtoolbox/blob/master/R/geeglm.R
The text was updated successfully, but these errors were encountered: