Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cook's distance or DFFIT, DFBETA, leverage #368

Open
Generalized opened this issue Nov 4, 2023 · 10 comments
Open

Cook's distance or DFFIT, DFBETA, leverage #368

Generalized opened this issue Nov 4, 2023 · 10 comments
Labels
enhancement New feature or request SP5

Comments

@Generalized
Copy link

Generalized commented Nov 4, 2023

Hi!
Would you consider adding some diagnostics to mmrm? Sure, it can be implemented manually, residuals can be QQ-plotted, but it would be nice some ready-to-go tools for the assessment of the impact of high-residual, high-leverage and combined observations.

Currently I can reproduce the MMRM with nlme::gls() and use predictmeans::CookD(model)
https://rdrr.io/cran/predictmeans/
obraz

It's also possible to run OLS and ignore the dependency in data, then check the residuals via olsrr package or use the low-level internal base R functions (dfbetas, dffits, cooks.distance). In many scenarios this may suffice, but complicates the whole process. And, because clusters are now ignored, the threshold for Cook's distance changes (4/n) between GLS and OLS.

For the purpose of inference (Wald's mostly) I'd much prefer DFBETAs. or at least Cook's d (combining leverage and residual distance).

Currently I can get them for the OLS-fit model. But since the within-subject correlation can affect the estimated model coefficients through the GLS (or GEE) estimation, at the end of the day there may be a discrepancy between OLS and GLS. OK, I don't care about the actual estimates, only about the influence, so highly influential observations will "manifest" themselves in both cases, but still I'd prefer to have it on-board for the gls/mmrm.

Sometimes I also replace GLS with GEE and use the methods available for GEE: dfbeta.glmgee
https://github.com/cran/glmtoolbox/blob/master/R/geeglm.R

@Generalized Generalized added the enhancement New feature or request label Nov 4, 2023
@danielinteractive
Copy link
Collaborator

Thanks @Generalized , nice idea, to get started, could you post here a minimal example of dfbeta and Cook's distance using gls? Then we can look into this later.

@yonicd
Copy link
Collaborator

yonicd commented Nov 6, 2023

could {dharma} be used to calculate all these metrics? iirc there was a plan to generate methods for {mmrm} with {dharma}.

@danielinteractive
Copy link
Collaborator

danielinteractive commented Nov 6, 2023

@yonicd good idea, not sure if it is possible with {dharma} but yes, now that interface should work. I will inform the author in florianhartig/DHARMa#358

@yonicd
Copy link
Collaborator

yonicd commented Nov 6, 2023

for reference here is the vignette for {dharma}

@yonicd
Copy link
Collaborator

yonicd commented Nov 6, 2023

another possible suite of packages that have model diagnostics builtin would be the easystats. They have a very comprehensive and user friendly suite of modeling packages that include diagnostics and post processing.

They already added an mmrm method to ingest the model output to their infrastructure.

@yonicd
Copy link
Collaborator

yonicd commented Nov 6, 2023

I see this proposal was already made :)

easystats/performance#650

@Generalized
Copy link
Author

# unstructured
m <- gls(Diff ~ Timepoint,
         correlation = corSymm(form=~as.numeric(Timepoint)|PatientId),
         weights = varIdent(form =~1|Timepoint),
         data = x)

> coef(summary(m))
                  Value  Std.Error    t-value      p-value
(Intercept)  0.23970502 0.06075448  3.9454709 0.0001287454
TimepointT2 -0.02976970 0.04171840 -0.7135868 0.4767424769
TimepointT3  0.06988018 0.05378336  1.2992899 0.1961092281
> car::qqPlot(residuals(m, type = "response"))
[1]  49 116

obraz

predictmeans::CookD(model = m, group = "PatientId", newwd = FALSE)

obraz

> predictmeans::residplot(m, newwd = FALSE)

obraz

Data:

x <- structure(list(PatientId = c("A1AA2", "A1AA2", "A1AA3", "A1AA3", 
"A1AA3", "A1AA4", "A1AA4", "A1AA4", "A1AA5", "A1AA5", "A1AA5", 
"A1AA6", "A1AA6", "A1AA7", "A1AA7", "A1AA7", "A1AA8", "A1AA9", 
"A1AA9", "A1AA9", "A2AA1", "A2AA1", "A2AA2", "A2AA2", "A2AA2", 
"A2AA4", "A2AA4", "A2AA5", "A2AA6", "A2AA8", "A2AA8", "A2AA9", 
"A2A1A", "A2A11", "A2A11", "A2A12", "A2A12", "A2A12", "A2A13", 
"A2A13", "A2A14", "A2A15", "A2A16", "A2A16", "A2A17", "A2A19", 
"A2A19", "A2A2A", "A2A2A", "A2A22", "A2A22", "A2A23", "A2A23", 
"A2A23", "A2A24", "A2A24", "A2A25", "A2A25", "A2A26", "A2A26", 
"A4AA1", "A4AA1", "A4AA1", "A4AA2", "A4AA2", "A6AA1", "A6AA1", 
"A6AA1", "A6AA2", "A6AA2", "A6AA3", "A6AA3", "A6AA3", "A6AA4", 
"A6AA4", "A6AA4", "A6AA5", "A6AA5", "A6AA6", "A6AA6", "A6AA7", 
"A6AA7", "A6AA8", "A6AA8", "A7AA1", "A7AA1", "A8AA1", "A8AA1", 
"A8AA1", "A8AA2", "A8AA2", "A8AA3", "A8AA3", "A8AA3", "A8AA4", 
"A8AA4", "A8AA4", "A8AA5", "A8AA5", "A8AA5", "A8AA6", "A8AA6", 
"A8AA6", "A8AA7", "A8AA8", "A8AA8", "A8AA8", "A9AA2", "A9AA2", 
"A9AA3", "1AAA1", "1AAA1", "1AAA2", "1AAA2", "1AAA2", "1AAA3", 
"1AAA3", "13AA1", "13AA1", "13AA3", "13AA3", "13AA4", "13AA5", 
"13AA5", "13AA5", "16AA1", "16AA1", "16AA1", "16AA2", "16AA2", 
"16AA2", "17AA1", "17AA1", "18AA1", "18AA1"), Diff = c(0.78, 
0.96, 0.08, 0.05, -0.12, 0.01, -0.05, -0.01, 0.65, 0.87, 1.1, 
0.49, 0.18, -0.62, -0.51, -0.42, 0.4, 0.09, -0.01, -0.08, 0.15, 
-0.1, 0.03, 0.04, 0.17, -0.08, -0.08, -0.04, -0.12, 0.19, 0.16, 
-0.15, 0, 0.38, 0.32, 0.71, 0.56, -0.03, 0.3, 0.38, -0.43, 0.4, 
0.38, 0.3, -0.09, 0.07, -0.03, 0.63, 2.03, 0.18, 0.27, 0.18, 
0.31, 0.37, 0.54, 0.42, 0.75, 0.62, 0.23, 0.3, 0.32, 0.21, 0.43, 
0.2, -0.12, 0.18, -0.1, 0.12, 0.14, -0.12, -0.13, 0.37, 0.26, 
0.18, -0.28, 0.1, 1.28, 1.51, 0.26, 0.47, 0.11, -0.26, 0.12, 
0.11, 0.58, 0.62, -0.08, -0.22, 0.29, -0.03, -0.04, 0.78, 0.6, 
1.12, -0.31, -0.04, -0.12, 0.27, 0.38, 0.45, 0.19, 0.17, 0.08, 
-0.12, 0.95, 0.48, 0.97, -0.02, -0.02, 0.29, 0.12, 0.2, 0.36, 
0.39, 0.29, -1.14, -0.6, 0.22, 0.86, 1.53, 0.9, 1.03, -0.47, 
0.32, -0.11, 0.27, 0.3, 0.49, 0.93, 0.92, 0.84, -0.09, -0.29, 
0.08, 0.05), Timepoint = structure(c(1L, 2L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 3L, 1L, 2L, 3L, 2L, 1L, 2L, 3L, 1L, 2L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 1L, 3L, 1L, 2L, 3L, 
2L, 3L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 2L, 3L, 2L, 3L, 1L, 2L, 3L, 
1L, 3L, 2L, 3L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 1L, 2L, 3L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 
3L, 1L, 2L, 1L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 
2L), levels = c("T1", "T2", "T3"), class = "factor")), class = "data.frame", row.names = c(NA, 
-135L), Version = 5L, Date = structure(1698663319.543, tzone = "UTC", class = c("POSIXct", 
"POSIXt")))

@Generalized
Copy link
Author

Generalized commented Nov 26, 2023

@danielinteractive

Please find a quick and dirty, naive translation of the existing function in predictmeans for nlme::gls(). It may serve as starting point for a better implementation.

The update.mmrm() was missing (or I couldn't it), so I added it. It's quite common to use update() in R, by the way.

# Naively translated from nlme:::update.gls()

update.mmrm <- function (object, model., ..., evaluate = TRUE) 
{
  call <- object$call
  if (is.null(call)) 
    stop("need an object with call component")
  extras <- match.call(expand.dots = FALSE)$...
  if (!missing(model.)) 
    call$model <- update.formula(formula(object), model.)
  if (length(extras) > 0L) {
    glsa <- names(as.list(args(mmrm::mmrm)))
    names(extras) <- glsa[pmatch(names(extras), glsa[-length(glsa)])]
    existing <- !is.na(match(names(extras), names(call)))
    for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
    if (any(!existing)) {
      call <- c(as.list(call), extras[!existing])
      call <- as.call(call)
    }
  }
  if (evaluate) 
    eval(call, parent.frame())
  else call
}

And the calculating and printing function. I guess you will split it into something like cookd.mmrm and print.cook.mmrm or so.

# Naively translated from predictmeans::CookD()

calculate_Cooks_distance <- function(model, plot = TRUE, label_n_max = 3, use_4n_threshold = TRUE) {
  
  group <- model$formula_parts$subject_var
  
  model <- update(model, reml = FALSE, data = model$data)
  
  mdf <- model.frame(model)
  # or mdf <- insight::get_data(model)
  
  beta0 <- coef(model)
  vcovb <- vcov(model)
  vb.inv <- solve(vcovb)
  
  rn <- unique(mdf[, group])
  
  LOOmp <- lapply(rn, function(x) {
    rind <- mdf[, group] != x
    upd_model <- update(model, data = mdf[rind, ])
    list(coef = coef(upd_model),
         vcov = vcov(upd_model))
  })
  
  LOObeta <- sapply(LOOmp, function(x) x$coef)
  
  rK <- t(LOObeta - beta0)
  CookD <- diag(rK %*% tcrossprod(vb.inv, rK)/length(beta0))
  names(CookD) <- rn

  # plotting section
  
  if(use_4n_threshold) {
    obs_to_print <- CookD >= 4 / length(rn)   # rn, not CookD, becase we deal with clusters/Patient's here
  } else {
    obs_to_print <- CookD >= sort(CookD, decreasing = TRUE)[label_n_max]
  }
  
  cook_distances <- tibble(obs_id = seq(CookD), 
                           CookD, 
                           obs_label = names(CookD),
                           obs_to_print)
  
  ggplot(cook_distances, aes(x = obs_id, y = CookD)) +
    geom_point(aes(col = obs_to_print),
               show.legend = FALSE) +
    geom_segment(aes(x = obs_id, xend = obs_id, y=0, yend=CookD, col = obs_to_print), 
                 linewidth = 0.3,
                 show.legend = FALSE) +
    geom_text(aes(x = obs_id, 
                  y = CookD + max(CookD) * 0.025, 
                  label = ifelse(obs_to_print, obs_label, ""),
                  col = obs_to_print),
              show.legend = FALSE) +
    geom_hline(yintercept = 4 / length(unique(cook_distances$obs_id)), 
               linetype="dashed", 
               col = ifelse(use_4n_threshold, "red", NA),
               linewidth=0.3) +
    theme_bw() +
    scale_color_manual(values = c("TRUE" = "red", "FALSE" = "blue")) +
    theme(panel.grid.major = element_blank(),
          panel.grid.minor = element_blank()) +
    labs(x = "Observation ID", y = "Cook's distance",
         title = "Cook's distance") -> cook_plot
  
  if(plot)
    print(cook_plot)

  invisible(return(list(plot = cook_plot, 
              cook_distances = cook_distances)))
}

And the results:

m <- mmrm(Diff ~ Timepoint + us(Timepoint | PatientId), data = x)
calculate_Cooks_distance(m, use_4n_threshold = FALSE)

obraz

or calculate_Cooks_distance(m, use_4n_threshold = TRUE)
obraz

or calculate_Cooks_distance(m, label_n_max = 2, use_4n_threshold = FALSE)
obraz


PS: When you implement any plots in mmrm, please always return the ggplot2 objects. Many people don't do that and it's impossible to adjust the plots (from title to theme/colors) later in the code without modifying the source. You can do it this way: by providing the data frame and the plot object.

If you use patchwork, just return a list of the source ggplots (can be combined via patchwork::wrap_plots).

The function I proposed returns:

> calculate_Cooks_distance(m, label_n_max = 2, use_4n_threshold = FALSE)
$plot

$cook_distances
# A tibble: 62 × 4
   obs_id   CookD obs_label obs_to_print
    <int>   <dbl> <chr>     <lgl>       
 1      1 0.00145 1AAA1     FALSE       
 2      2 0.00506 1AAA2     FALSE       
 3      3 0.0581  1AAA3     FALSE       
 4      4 0.0451  13AA1     FALSE       
 5      5 0.0664  13AA3     FALSE       
 6      6 0.0183  13AA4     FALSE       
 7      7 0.0567  13AA5     FALSE       
 8      8 0.00293 16AA1     FALSE       
 9      9 0.0236  16AA2     FALSE       
10     10 0.0120  17AA1     FALSE       
# ℹ 52 more rows
# ℹ Use `print(n = ...)` to see more rows

so I can modify it in my code in any fancy way I need:

calculate_Cooks_distance(m, label_n_max = 5, use_4n_threshold = FALSE)$plot +
    theme_modern() +
    labs(title = "My very very very very very very very very very very very very long title that will be automatically wrapped", subtitle = "Subgroup: naive treatment", caption = "Some caption") +
    theme(plot.title = ggtext::element_textbox_simple(size = 11, margin = margin(0, 0, 5, 0, unit="mm")),
        plot.caption = element_text(face = "italic", size = 10, color = "red"))

obraz


By the way, there's also package HLMdiag, which handles nlme::lme (but not gls). These codes can also be somehow inspiring to you.

Let's fit a LME close to MMRM with US:

coef(summary(m_lme <- nlme::lme(Diff ~ Timepoint, random = ~0+Timepoint | PatientId, data = x)))

                  Value  Std.Error DF   t-value      p-value
(Intercept)  0.23970502 0.06075453 71  3.945467 0.0001850027
TimepointT2 -0.02976969 0.04171838 71 -0.713587 0.4778217675
TimepointT3  0.06988020 0.05378337 71  1.299290 0.1980477327

Let's compare LME vs MMRM to check if we're on the same page:

coef(summary(m_mmrm <- mmrm(Diff ~ Timepoint + us(Timepoint | PatientId), data=x, method = "Between-Within")))

               Estimate Std. Error df    t value     Pr(>|t|)
(Intercept)  0.23970557 0.06075461 71  3.9454711 0.0001850003
TimepointT2 -0.02977032 0.04171911 71 -0.7135895 0.4778202149
TimepointT3  0.06988000 0.05378180 71  1.2993245 0.1980359513

# Calculate ratios of all coefficient elements across both methods. Should be close to 1.

> coef(summary(m_mmrm)) / coef(summary(m_lme))
             Estimate Std. Error df  t value  Pr(>|t|)
(Intercept) 1.0000023  1.0000013  1 1.000001 0.9999871
TimepointT2 1.0000209  1.0000174  1 1.000004 0.9999968
TimepointT3 0.9999973  0.9999707  1 1.000027 0.9999405

Good! Being on the same page, we can examine the functions from HLMdiag:

> HLMdiag::hlm_influence(m_lme, level = "PatientId") 
# A tibble: 62 × 6
   PatientId  cooksd mdffits covtrace covratio leverage.overall
   <fct>       <dbl>   <dbl>    <dbl>    <dbl>            <dbl>
 1 A1AA2     0.0249  0.0245    0.0404     1.04            0.877
 2 A1AA3     0.0103  0.0100    0.0734     1.08            0.851
 3 A1AA4     0.00326 0.00319   0.0734     1.08            0.954
 4 A1AA5     0.0331  0.0323    0.0734     1.08            0.839
 5 A1AA6     0.0191  0.0185    0.0504     1.05            0.839
 6 A1AA7     0.0241  0.0236    0.0734     1.08            0.839
 7 A1AA8     0.00137 0.00134   0.0177     1.02            0.851
 8 A1AA9     0.00835 0.00811   0.0734     1.08            0.851
 9 A2AA1     0.00824 0.00806   0.0404     1.04            0.851
10 A2AA2     0.00190 0.00186   0.0734     1.08            0.839
# ℹ 52 more rows
# ℹ Use `print(n = ...)` to see more rows

Let's compare Cook's D from mmrm and lme and calculate their ratio to check their proximity:

HLMdiag::hlm_influence(m_lme, level = "PatientId") %>%
    inner_join(
        calculate_Cooks_distance(m_mmrm, plot=FALSE)$cook_distances %>%
            select(obs_label, CookD), 
        by=c("PatientId" = "obs_label")) %>%
    mutate(ratio = cooksd / CookD) %>% 
    print(n=50)

# A tibble: 62 × 8
   PatientId    cooksd   mdffits covtrace covratio leverage.overall     CookD ratio
   <chr>         <dbl>     <dbl>    <dbl>    <dbl>            <dbl>     <dbl> <dbl>
 1 A1AA2     0.0249    0.0245      0.0404     1.04            0.877 0.0261    0.956
 2 A1AA3     0.0103    0.0100      0.0734     1.08            0.851 0.0121    0.852
 3 A1AA4     0.00326   0.00319     0.0734     1.08            0.954 0.00363   0.898
 4 A1AA5     0.0331    0.0323      0.0734     1.08            0.839 0.0320    1.03 
 5 A1AA6     0.0191    0.0185      0.0504     1.05            0.839 0.0187    1.02 
 6 A1AA7     0.0241    0.0236      0.0734     1.08            0.839 0.0269    0.895
 7 A1AA8     0.00137   0.00134     0.0177     1.02            0.851 0.00141   0.972
 8 A1AA9     0.00835   0.00811     0.0734     1.08            0.851 0.0100    0.836
 9 A2AA1     0.00824   0.00806     0.0404     1.04            0.851 0.00826   0.998
10 A2AA2     0.00190   0.00186     0.0734     1.08            0.839 0.00201   0.946
11 A2AA4     0.00328   0.00323     0.0404     1.04            0.851 0.00333   0.985
12 A2AA5     0.00319   0.00313     0.0208     1.02            0.851 0.00305   1.05 
13 A2AA6     0.00368   0.00362     0.0178     1.02            0.839 0.00376   0.979
14 A2AA8     0.00137   0.00133     0.0461     1.05            0.839 0.00148   0.923
15 A2AA9     0.00552   0.00541     0.0208     1.02            0.839 0.00558   0.990
16 A2A1A     0.00163   0.00161     0.0178     1.02            0.877 0.00168   0.973
17 A2A11     0.00252   0.00244     0.0504     1.05            0.839 0.00250   1.01 
18 A2A12     0.0927    0.0898      0.0734     1.08            0.938 0.0845    1.10 
19 A2A13     0.000344  0.000337    0.0461     1.05            0.839 0.000310  1.11 
20 A2A14     0.0155    0.0152      0.0177     1.02            0.877 0.0159    0.972
21 A2A15     0.00137   0.00134     0.0177     1.02            0.839 0.00141   0.972
22 A2A16     0.00319   0.00310     0.0504     1.05            0.891 0.00317   1.01 
23 A2A17     0.00340   0.00334     0.0177     1.02            0.938 0.00345   0.985
24 A2A19     0.00259   0.00254     0.0404     1.04            0.938 0.00246   1.06 
25 A2A2A     0.142     0.138       0.0461     1.05            0.877 0.136     1.04 
26 A2A22     0.0000401 0.0000394   0.0461     1.05            0.938 0.0000669 0.600
27 A2A23     0.00365   0.00356     0.0734     1.08            0.851 0.00349   1.05 
28 A2A24     0.00659   0.00641     0.0504     1.05            0.954 0.00650   1.01 
29 A2A25     0.0158    0.0154      0.0461     1.05            0.891 0.0152    1.04 
30 A2A26     0.0000889 0.0000865   0.0461     1.05            0.839 0.0000930 0.955
31 A4AA1     0.00115   0.00112     0.0734     1.08            0.877 0.00125   0.918
32 A4AA2     0.0122    0.0120      0.0404     1.04            0.891 0.0126    0.967
33 A6AA1     0.00998   0.00976     0.0734     1.08            0.891 0.0112    0.892
34 A6AA2     0.00915   0.00895     0.0404     1.04            0.891 0.00921   0.993
35 A6AA3     0.0333    0.0324      0.0734     1.08            0.851 0.0305    1.09 
36 A6AA4     0.0269    0.0263      0.0734     1.08            0.839 0.0309    0.871
37 A6AA5     0.0689    0.0676      0.0404     1.04            0.851 0.0722    0.954
38 A6AA6     0.00811   0.00793     0.0404     1.04            0.964 0.00855   0.948
39 A6AA7     0.0194    0.0189      0.0404     1.04            0.954 0.0204    0.951
40 A6AA8     0.000423  0.000416    0.0404     1.04            0.891 0.000447  0.946
41 A7AA1     0.00669   0.00657     0.0404     1.04            0.964 0.00711   0.941
42 A8AA1     0.0246    0.0239      0.0734     1.08            0.839 0.0333    0.738
43 A8AA2     0.00241   0.00237     0.0404     1.04            0.851 0.00243   0.992
44 A8AA3     0.0197    0.0192      0.0734     1.08            0.839 0.0178    1.11 
45 A8AA4     0.0121    0.0119      0.0734     1.08            0.851 0.0116    1.04 
46 A8AA5     0.00357   0.00348     0.0734     1.08            0.839 0.00342   1.04 
47 A8AA6     0.00452   0.00438     0.0734     1.08            0.839 0.00495   0.912
48 A8AA7     0.00368   0.00362     0.0178     1.02            0.851 0.00376   0.979
49 A8AA8     0.0231    0.0226      0.0734     1.08            0.851 0.0273    0.847
50 A9AA2     0.00210   0.00207     0.0404     1.04            0.851 0.00214   0.983
# ℹ 12 more rows
# ℹ Use `print(n = ...)` to see more rows

Nice! Let's visualize it:

  HLMdiag::hlm_influence(m_lme, level = "PatientId") %>%
    inner_join(
      calculate_Cooks_distance(m_mmrm, plot=FALSE)$cook_distances %>% 
        select(obs_label, CookD), 
      by=c("PatientId" = "obs_label")) %>% 
    mutate(ratio = cooksd / CookD) %>% 
    pivot_longer(cols = c(CookD, cooksd), 
                 names_to = "Method", 
                 values_to = "Cook") %>%
    mutate(Method = case_match(Method, "CookD" ~ "MMRM", "cooksd" ~ "LME")) %>%
    ggplot(aes(x=PatientId, y=Cook, col=Method)) + 
    geom_point() + 
    coord_flip() +
    theme_bw()

obraz

By implementing HLMdiag you get all these measures ready for MMRM: Cook's distance, MDFFITS, covtrace, covratio, and leverage ('overall' (default), 'fixef', 'ranef', or 'ranef.uc')

@danielinteractive
Copy link
Collaborator

Thanks a lot @Generalized for the prototypes! That is very helpful.

I think we can definitely look into adding the diagnostic measures into the package.
For plots, I would be more careful - we have made a conscious decision to leave plots to downstream packages or user code, to keep the package scope small and crisp.

Note that we'll also soon publish tern.mmrm on CRAN, which contains plots, tables etc. Plot function suggestions would better fit in there.

@Generalized
Copy link
Author

Generalized commented Nov 27, 2023

I understand the reasons and yes, I agree. People often prefer own formatting. I've practically never used any defaults, always tweaking it for my preferences (even if it contradicted journals needs :D)

But at least please consider providing the numbers alone (ACF, Cook's, DFBETA (well complements Cook's), maybe something else you consider important), maybe as respective slots in the fitted model, or as a stand-alone dedicated functions so everyone can plot them on their own. This will make 90% of the work already done and validated.

Regarding additional package, I already use over 270 packages at work, so I'm very resistant to add more ones to my tool stack. Especially that I utilize own written engine for rendering tables (via Officeverse) in the formats our Sponsors prefer (it differs a bit from what's typically offered, uses decimal tabulations, specific formatting) and logging (via RMarkdown), fitting our workflow.

/ for some exploratory studies I used 50+ packages in a single trial: multiple endpoints, numerous different-kind analyses, sensitivity analyses employing different models/estimation methods for comparison, imputations; will write a blog post about that. /

But, anyway, your idea to move extras to an auxiliary package makes lots of sense. From one perspective, it makes the core package lightweight, free from unnecessary dependencies, which facilitates further development and maintenance. From the other side, organizing things in a family of related packages ensures internal consistency and allows to experiment with stuff. I observe it already elsewhere, like the Officeverse, Tidyverse or easystats ecosystems.

Only please make it easily installable from CRAN, because in the past, as far as I remember, it required some GitHub work requiring GitHub tokens, which is problematic when implementing automatically generated environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SP5
Projects
Status: Issue To start
Development

No branches or pull requests

3 participants