-
Notifications
You must be signed in to change notification settings - Fork 1
/
autoregressive_time_series.Rmd
547 lines (427 loc) · 31.1 KB
/
autoregressive_time_series.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
---
title: "Repeated Measures in Time"
output:
html_document:
includes:
in_header: header.html
after_body: footer.html
params:
hilang:
- sas
---
```{r, echo=FALSE, message=FALSE, warning=FALSE, purl=FALSE}
# markdown options
options(knitr.kable.NA = '') # show NA as empty cells in output tables
pacman::p_load(kableExtra, formattable, htmltools) # packages for better formatting tables for html output
```
```{r, child="_hilang_setup.Rmd", purl=FALSE}
```
```{r, message = FALSE, error = FALSE, warning = FALSE}
# packages
pacman::p_load(conflicted, # package function conflicts
dplyr, purrr, tibble, tidyr, stringr, # data handling
gganimate, ggplot2, gifski, viridis, # plot
nlme, lme4, glmmTMB, sommer, # mixed modelling
AICcmodavg, mixedup) # mixed model extractions
# package function conflicts
conflict_prefer("filter", "dplyr")
# data
dat <- agriTutorial::sorghum %>%
rename(block = Replicate, plot = factplot) %>%
dplyr::select(y, variety, block, plot, factweek, varweek) %>%
as_tibble()
```
```{r, echo=FALSE, purl=FALSE}
dat %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE) %>%
scroll_box(height = "200px")
```
<br/>
# Motivation
The example in this chapter is taken from [*Example 4* in Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} (see also the [Agritutorial vigniette](https://cran.r-project.org/web/packages/agriTutorial/vignettes/agriTutorialVignette.pdf#%5B%7B%22num%22%3A81%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C28.346%2C813.543%2Cnull%5D){target="_blank"}). It considers data from a sorghum trial laid out as a randomized complete `block` design (5 blocks) with `variety` (4 sorghum varities) being the only treatment factor. Thus, we have a total of 20 `plot`s. It is important to note that our response variable (`y`), **the leaf area index, was assessed in five consecutive weeks on each plot** starting 2 weeks after emergence. Therefore, the dataset contains a total of 100 values and what we have here is longitudinal data, *a.k.a.* repeated measurements over time, *a.k.a.* a time series analysis.
As [Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} put it: *"the week factor is not a treatment factor that can be randomized. Instead, repeated measurements are taken on each plot on five consecutive occasions. Successive measurements on the same plot are likely to be serially correlated, and this means that for a reliable and efficient analysis of repeated-measures data we need to take proper account of the serial correlations between the repeated measures ([Piepho, Büchse & Richter, 2004](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1439-037X.2004.00097.x){target="_blank"}; [Pinheiro & Bates, 2000](http://library.mpib-berlin.mpg.de/toc/z2008_18.pdf){target="_blank"})."*
```{r, class.source = "fold-hide", purl = FALSE, message = FALSE, warning = FALSE}
gganimate_plot <- ggplot(data = dat, aes(
y = y,
x = varweek,
group = variety,
color = variety
)) +
geom_boxplot(outlier.shape = NA) +
geom_point(alpha = 0.5, size = 3) +
scale_color_viridis(option = "D",
discrete = TRUE,
name = "Variety") +
scale_y_continuous(
name = "Leaf area index",
limits = c(0, 6.5),
expand = c(0, 0),
breaks = c(0:6)
) +
xlab("Week") +
theme_bw() +
theme(legend.position = "bottom") +
transition_time(varweek) +
shadow_mark(exclude_layer = 2)
animate(gganimate_plot, renderer = gifski_renderer()) # render gif
```
# Modelling
Our goal is therefore to build a suitable model taking serial correlation into account. In order to do this, we will initially consider the model for a single time point. Then, we extend this model to account for multiple weeks by allowing for week-speficic effects. Finally, we further allow for serially correlated error terms.
# Single week
When looking at data from a single time point (*e.g.* the first week), we merely have 20 observations from a randomized complete block design with a single treatment factor. It can therefore be analyzed with a simple one-way ANOVA (fixed `variety` effect) for randomized complete block designs (fixed `block` effect):
```{r}
dat.wk1 <- dat %>% filter(factweek == "1") # subset data from first week only
mod.wk1 <- lm(formula = y ~ variety + block,
data = dat.wk1)
```
We could now go on and look at the ANOVA via `anova(mod.wk1)` and it would indeed not be _wrong_ to simply repeat this for each week. Yet, one may not be satisfied with obtaining multiple ANOVA results - namely one per week. This is especially likeliy in case the results contradict each other, because *e.g.* the variety effects are found to be significant in only two out of five weeks. Therefore, one may want to analyze the entire dataset *i.e.* the multiple weeks jointly.
# Multiple weeks - independent errors {.tabset .tabset-fade .tabset-pills}
Going from the single-week-analysis to jointly analyzing the entire dataset is more than just changing the `data =` statement in the model. This is because *"it is realistic to assume that the treatment effects evolve over time and thus are week-specific. Importantly, we must also allow for the block effects to change over time in an individual manner. For example, there could be fertility or soil type differences between blocks and these could have a smooth progressive or cumulative time-based effect on differences between the blocks dependent on factors such as temperature or rainfall"* [(Piepho & Edmondson, 2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"}. We implement this by taking the model in `mod.wk1` and multipyling each effect with `factweek`. Note that this is also true for the general interecept (µ) in `mod.wk1`, meaning that we would like to include one intercept per week, which can be achieved by simply adding `factweek` as a main effect as well. This leaves us with fixed main effects for `factweek`, `variety`, and `block`, as well as the week-specific effects of the latter two `factweek:variety` and `factweek:block`.
Finally, note that we are not doing anything about the model's error term at this point. More specifically this means that its variance structure is still the default **iid** (independent and identically distributed) - see [our summary on correlation/variance strucutres here](variance_structures.html){target="_blank"}.
## nlme
Since the models in this chapter do not contain any random effects, we make use of `gls()` instead of `lme()`. Furthermore, the above named model `factweek + variety + block + factweek:variety + factweek:block` can be written in a shorter syntax as:
```{r}
mod.iid.nlme <- nlme::gls(model = y ~ factweek * (variety + block),
correlation = NULL, # default, i.e. homoscedastic, independent errors
data = dat)
# Extract variance component estimates
mod.iid.nlme.VC <- tibble(varstruct = "iid") %>%
mutate(sigma = mod.iid.nlme$sigma) %>%
mutate(Variance = sigma^2)
```
```{r, echo=FALSE, purl=FALSE}
mod.iid.nlme.VC %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## lme4
Since the models in this chapter do not contain any random effects, we cannot use `lmer()` or any other function of the `lme4` package. However, even if there were random effects in our models, the short answer here is that with `lme4` it is **not possible** to fit any variance structures.
More specifically, we can read in an [`lme4` vigniette](https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf){target="_blank"}: *"The main advantage of `nlme` relative to `lme4` is a user interface for fitting models with structure in the residuals (various forms of heteroscedasticity and autocorrelation) and in the random-effects covariance matrices (e.g., compound symmetric models). With some extra effort, the computational machinery of `lme4` can be used to fit structured models that the basic `lmer` function cannot handle (see [Appendix A](https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf#%5B%7B%22num%22%3A15%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C81%2C733.028%2Cnull%5D){target="_blank"})"*
Michael Clark [puts it as](https://m-clark.github.io/mixed-models-with-R/extensions.html#heterogeneous-variance){target="_blank"} *"Unfortunately, lme4 does not provide the ability to model the residual covariance structure, at least [not in a straightforward fashion](https://bbolker.github.io/mixedmodels-misc/notes/corr_braindump.html){target="_blank"}"*
<span style="color:red">Accordingly, there is no info on this package for this chapter beyond this point.</span>
<br/>
## glmmTMB
In `glmmTMB()` it is -to our knowledge- not possible to adjust the variance structure of the error.
>Just like in `nlme`, there is a `weights=` argument in `glmmTMB()`. However, to our understanding, they have different functions:
>
>In `nlme`, it requires *"an optional `varFunc` object or one-sided formula describing the within-group heteroscedasticity structure"* [(nlme RefMan)](https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf#Rfn.glmmTMB.1){target="_blank"} and we make use of this in the chapter at hand.
>
>In `glmmTMB`, the [RefMan](https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf#Rfn.glmmTMB.1){target="_blank"} only states *"weights, as in `glm`. Not automatically scaled to have sum 1"*. Following this trail, the [`glm` documentation](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html){target="_blank"} description for the `weights=` argument is *"an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector."*.
> Accordingly, it cannot be used to allow for heterogeneous error variances in this package.
We can, however, "*fix the residual variance to be 0 (actually a small non-zero value)*" and therefore "*force variance into the random effects*" [(glmmTMB RefMan)](https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf#Rfn.glmmTMB.1){target="_blank"} via adding the `dispformula = ~ 0` argument. Thus, when doing so, we need to make sure to also add a random term to the model with the desired variance structure. By taking both of these actions, we are essentially mimicing the error (variance) as a random effect (variance). We achieve this by first creating a `unit` column in the data with different entries for each data point.
Note that doing this may not be necessary for `mod.iid.glmm` with the default homoscedastic, independent error variance structure, but since it will be necessary for the following model with a more sophisticated variance structure, we will apply it here, too.
Finally, for extracting the variance component estimates from `glmmTMB` objects in this chapter, we use the `extract_vc` function from the
[`mixedup` helper package](https://github.com/m-clark/mixedup){target="_blank"}.
```{r}
dat <- dat %>%
mutate(unit = 1:n() %>% as.factor) # new column with running number
mod.iid.glmm <- glmmTMB(formula = y ~ factweek * (variety + block)
+ (1 | unit), # add random unit term to mimic error variance
dispformula = ~ 0, # fix original error variance to 0
REML = TRUE, # needs to be stated since default = ML
data = dat)
# Extract variance component estimates
mod.iid.glmm.VC <- mod.iid.glmm %>%
extract_vc(ci_scale = "var")
```
```{r, echo=FALSE, purl=FALSE}
mod.iid.glmm.VC %>%
dplyr::select(-sd, -var_prop) %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## sommer
It is quite easy to extract variance component estimates from `sommer` objects in a table format we are used to:
```{r}
mod.iid.somm <- mmer(fixed = y ~ factweek + variety + block + factweek:variety + factweek:block,
rcov = ~ units, # default = iid
data = dat, verbose=F)
# Extract variance component estimates
mod.iid.somm.VC <- summary(mod.iid.somm)$varcomp
```
```{r, echo=FALSE, purl=FALSE}
mod.iid.somm.VC %>%
as_tibble(rownames="grp") %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## SAS
In SAS, there is an actual `subject=` statement that is used to identify the subjects on which repeated measurements are taken. This may not be necessary for the `iid` with the default homoscedastic, independent error variance structure, but since it will be necessary for the following model with a more sophisticated variance structure, we will apply it here, too. In the end, the entire `repeated` line below could also be left out as it simply explicitly states the default setting.
```{r, eval=FALSE, hilang="sas", purl=FALSE}
proc mixed data=dat;
class variety block factweek plot;
model y=block variety factweek block*factweek variety*factweek /ddfm=kr2;
repeated factweek/sub=plot type=VC rcorr; /* = default iid */
ods output covparms=modiidsasVC FitStatistics=modiidsasAIC;
run;
proc print data=modiidsasVC;
run;
```
<br/>
```{r, echo=FALSE, message=FALSE,}
sasvc <- readr::read_delim("SAS/autoregressive_time_series_results_VC.txt", delim="\t")
sasvciid <- sasvc %>%
filter(mod=="iid")
```
```{r, echo=FALSE, message=FALSE, purl=FALSE}
sasvciid %>%
dplyr::select(mod, everything()) %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
# Multiple weeks - autocorrelated errors {.tabset .tabset-fade .tabset-pills}
Note that at this point of the analysis, the model above with an independent, homogeneous error term above is neither the right, nor the wrong choice. It must be clear that *"measurements on the same plot are **likely** to be serially correlated"*. Thus, it should be investigated whether any covariance structure for the error term (instead of the default independence between errors) is more appropriate to model this dataset. It could theoretically be the case that this `mod.iid` is the best choice here, but we cannot confirm this yet, as we have not looked at any alternatives. This is what we will do in the next step.
> One may ask **at what step of the analysis it is best to compare and find the appropriate covariance structure for the error term** in such a scenario. It is indeed here, at this step. As [Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} write: *"**Before modelling the treatment effect**, a variance–covariance model needs to be identified for these correlations. This is best done by using a saturated model for treatments and time, i.e., a model that considers all treatment factors and time as qualitative."*
>
> <br/>
>
> Note that this *saturated model* is what we have in `mod.iid`. So in short: one should compare variance structures for the error term **before** running an ANOVA / conducting model selection steps.
Units on which repeated observations are taken are often referred to as subjects. We would now like to allow measurements taken on the same subjects (plot in this case) to be serially correlated, while observations on different subjects are still considered independent. More specifically, we want the errors of the respective observations to be correlated in our model.
<img src="img\corrvalues.PNG" style="width:60%; margin-right: 10px" align="left">
Take the visualisation on the left depicting a subset of our data. Here, you can see plots 1 and 2 (out of the total of 20 in our dataset) side by side. Furthermore, they are shown for weeks 1-3 (out of the total of 5 in our dataset). Thus, a total of six values are represented, coming from only two subjects/plots, but obtained in three different weeks. The blue arrows represent correlation among errors. For these six measurements, there are six blue arrows, since there are six error pairs that come from the same plot, respectively. The green lines on the other hand represent error pairs that do not come from the same plot and thus are assumed to be independent. Finally, notice that the errors coming from the same plot, but with two instead of just one week between their measurements, are shown in a lighter blue. This is because one may indeed assume a weaker correlation between errors that are further apart in terms of time passed between measurements.
The latter can be achieved by using the maybe most popular correlation structure for repeated measures over time: **first order autoregressive AR(1)**. Please find the section on AR(1) in [our summary on correlation/variance strucutres here.](variance_structures.html){target="_blank"} It should be noted that this correlation structure is useful, if all time points are equally spaced, which is the case here, as there is always exactly one week between consecutive time points.
> There are other possible models (*i.e.* [other variance structures](variance_structures.html){target="_blank"}) for serial correlation of longitudinal data, but for simplicity we will only compare the default `iid` model above with the `ar1` model below. In fact, [*Example 4* in Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} does investigate other models as well and you can find their `nlme` code in the [Agritutorial vigniette](https://cran.r-project.org/web/packages/agriTutorial/vignettes/agriTutorialVignette.pdf#%5B%7B%22num%22%3A81%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C28.346%2C813.543%2Cnull%5D){target="_blank"}.
## nlme
In `nlme` we make use of the `correlation =` argument and use the `corAR1` [correlation strucutre class](https://cran.r-project.org/web/packages/nlme/nlme.pdf#Rfn.corClasses.1){target="_blank"}. In its syntax, subjects are identified after the `|`.
```{r}
mod.ar1.nlme <- nlme::gls(model = y ~ factweek * (variety + block),
correlation = corAR1(form = ~ varweek | plot),
data = dat)
# Extract variance component estimates
mod.ar1.nlme.VC <- tibble(varstruct = "ar(1)") %>%
mutate(sigma = mod.ar1.nlme$sigma,
rho = coef(mod.ar1.nlme$modelStruct$corStruct, unconstrained = FALSE)) %>%
mutate(Variance = sigma^2,
Corr1wk = rho,
Corr2wks = rho^2,
Corr3wks = rho^3,
Corr4wks = rho^4)
```
```{r, echo=FALSE, purl=FALSE}
mod.ar1.nlme.VC %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## lme4
<span style="color:red">not possible - see above.</span>
## glmmTMB
To model the variance structure of our mimiced error term as first order autoregressive, we replace the `(1 | unit)` from the `iid` model with `ar1(factweek + 0 | plot)`. As can be seen, subjects are identified after the `|` and the variance structure in this syntax. Further notice that leaving out the `+ 0` would by default lead to estimating an additional overall variance (see *Details* for [`glmmTMB` topic in the RefMan](https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf#Rfn.glmmTMB.1){target="_blank"}).
Note that by adding the `show_cor = TRUE` argument to the `extract_vc` function, we obtain two outputs: The variance component estimates, as seen above for the `iid` model, as well as the correlation matrix / variance structure for a single plot.
```{r}
mod.ar1.glmm <- glmmTMB(formula = y ~ factweek * (variety + block)
+ ar1(factweek + 0 | plot), # add ar1 structure as random term to mimic error variance
dispformula = ~ 0, # fix original error variance to 0
REML = TRUE, # needs to be stated since default = ML
data = dat)
# Extract variance component estimates
mod.ar1.glmm.VC <- mod.ar1.glmm %>%
extract_vc(ci_scale = "var", show_cor = TRUE)
```
```{r, echo=FALSE, purl=FALSE}
mod.ar1.glmm.VC %>%
pluck("Variance Components") %>%
dplyr::select(-sd, -var_prop) %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
mod.ar1.glmm.VC %>%
pluck("Cor") %>% pluck("plot") %>%
as_tibble(rownames = "cor") %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
column_spec(1, bold = T) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## sommer
Trying to use sommer here leads to making two sacrifices and as a result it can be argued whether this analysis with sommer is still what we are aiming for in this specific example. While yes, there is an [AR1 function in sommer](https://cran.r-project.org/web/packages/sommer/sommer.pdf#Rfn.AR1.1){target="_blank"}, the two major limitations with it regarding this scenario are:
* It cannot be used for the error term (*i.e.* in the `rcov=` argument), but must be applied for a random effect.
* A fixed value for rho must actually be provided by the user via the `rho=` argument.
The first point would not be such a big deal, if we could force the error variance to be 0. Then, the random term with the desired ar1-structure would simply mimic the error term, which is exactly what we do with `glmmTMB` in this chapter! However, since `sommer` does not allow us to force the error variance to be 0, we get an iid error variance **and** the desired ar1-structure for plots across weeks. Interestingly, [Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} actually fit this model as well in addition to the ar1 model and refer to it as "ar1 + nugget". Strictly speaking, however, it simply is not the variance structure we were aiming for here.
The second point is simply a limitation in terms of finding the optimal parameters. Since rho is fixed to a value that we as the user must provide before the model iterates, the fitted model can only be as good as our provided rho.
```{r}
fixed.rho <- 0.7 # random guess
mod.ar1.somm <- mmer(fixed = y ~ factweek + variety + block + factweek:variety + factweek:block,
random = ~ vs(factweek, Gu = AR1(factweek, rho = fixed.rho)),
rcov = ~ units, # default = iid
data = dat, verbose=F)
# Extract variance component estimates
mod.ar1.somm.VC <- summary(mod.ar1.somm)$varcomp
```
```{r, echo=FALSE, purl=FALSE}
mod.ar1.somm.VC %>%
as_tibble(rownames="grp") %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
Note that rho is not reported as part of the variance component estimates - which makes sense, as it was not estimated.
## SAS
Here, we write `type=ar(1)` in the `repeated` statement in order to get the desired variance structure for the error term.
```{r, eval=FALSE, hilang="sas", purl=FALSE}
proc mixed data=dat;
class variety block factweek plot;
model y=block variety factweek block*factweek variety*factweek /ddfm=kr2;
repeated factweek/sub=plot type=ar(1) rcorr; /* ar1 variance structure */
ods output covparms=modar1sasVC FitStatistics=modar1sasAIC;
run;
proc print data=modar1sasVC;
run;
```
<br/>
```{r, echo=FALSE, message=FALSE,}
sasvcar1 <- sasvc %>%
filter(mod=="ar1")
```
```{r, echo=FALSE, message=FALSE, purl=FALSE}
sasvcar1 %>%
dplyr::select(mod, everything()) %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
# Model fit {.tabset .tabset-fade .tabset-pills}
After all, a decision must be made on whether the `ar1` variance structure for the error term was a good choice for modelling this dataset. More broadly speaking, it must be decided which of the models should be used for further investigations such as a test of fixed effects *etc*. As is standard procedure, we can do a model selection based on goodness-of-fit statistics such as the [AIC](https://www.wikiwand.com/en/Akaike_information_criterion#){target="_blank"} (Akaike information criterion).
For `nlme` and `glmmTMB`, we make use of the [`aictab()` function](https://cran.r-project.org/web/packages/AICcmodavg/AICcmodavg.pdf#Rfn.aictab.1){target="_blank"} from the helper package [AICcmodavg](https://CRAN.R-project.org/package=AICcmodavg){target="_blank"}.
## nlme
```{r, warning=FALSE}
AIC.nlme <- aictab(list(mod.iid.nlme, mod.ar1.nlme), modnames = c("iid", "ar1"))
```
```{r, echo=FALSE, purl=FALSE}
AIC.nlme %>%
dplyr::select(Modnames, K, AICc, Delta_AICc, Res.LL) %>%
mutate_at(vars(AICc:Res.LL), ~round(., 1)) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## lme4
<span style="color:red">not possible - see above.</span>
## glmmTMB
```{r, warning=FALSE}
AIC.glmm <- aictab(list(mod.iid.glmm, mod.ar1.glmm), modnames = c("iid", "ar1"))
```
```{r, echo=FALSE, purl=FALSE}
AIC.glmm %>%
dplyr::select(Modnames, K, AICc, Delta_AICc, LL) %>%
mutate_at(vars(AICc:LL), ~round(., 1)) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## sommer
Comparing AIC values from the two models once again shows that `sommer` does not really do what we are aiming for in this example. The AIC values for the two models are identical. The main reason for this is that they do not actually differ in the number of estimated parameters, since *rho* as the potentially second parameter here, was not estimated, but fixed by us.
[ Is this the whole story? ...in progress... ]
```{r}
somm.mods <- list(mod.iid.somm, mod.ar1.somm)
AIC.somm <- tibble(
Modnames = c("iid", "ar1"),
AIC = somm.mods %>% map("AIC") %>% unlist,
LL = somm.mods %>% map("monitor") %>% lapply(. %>% `[`(1, ncol(.))) %>% unlist) %>% # last element in first row of "monitor"
arrange(AIC)
```
```{r, echo=FALSE, purl=FALSE}
AIC.somm %>%
mutate_at(vars(AIC), ~round(., 1)) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## SAS
The model fit statistics have already been created via the `ods output` statement of the respective models and are displayed here jointly:
```{r, echo=FALSE, message=FALSE}
sasaic <- readr::read_delim("SAS/autoregressive_time_series_results_AIC.txt", delim="\t") %>%
mutate(Descr = str_remove(Descr, "\\s*\\([^\\)]+\\)") %>% str_trim) %>%
pivot_wider(names_from=Descr, values_from=Value)
```
```{r, echo=FALSE, message=FALSE, purl=FALSE}
sasaic %>%
dplyr::select(mod, everything()) %>%
mutate_if(is.double, round, 3) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
# Conclusion
The `ar1` model has the smaller AIC value. Thus, we choose the `ar1` over the `iid` model, since the former has the better fit.
Notice that this outcome satisfies the aim of this chapter (*i.e.* having an `ar1` variance structure for the error term) but regarding the analysis of the underlying dataset, more steps can be taken in order to get concluding results. You can find further steps in [*Example 4* in Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} (see also the [Agritutorial vigniette](https://cran.r-project.org/web/packages/agriTutorial/vignettes/agriTutorialVignette.pdf#%5B%7B%22num%22%3A81%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C28.346%2C813.543%2Cnull%5D){target="_blank"}). On one hand, they investigate more variance structure alternatives to `iid` and on the other hand, they continue selecting a suitable regression model for the time trend after selecting a sutiable variance structure.
# Summary {.tabset .tabset-fade}
## Syntax
package | model syntax
---|---
`nlme` | `correlation = corAR1(form = ~ TIME | SUBJECT)`
`lme4` | <span style="color:red">not possible</span>
`glmmTMB` | `random = ~ ar1(TIME + 0 | SUBJECT), dispformula = ~ 0`
`sommer` | `random = ~ vs(TIME, Gu = AR1(TIME, rho = FIXED.RHO))`
`SAS` | `repeated TIME/sub=SUBJECT type=ar(1);`
## VarComp
**mod.iid**
```{r, echo=FALSE}
tibble(mod="iid", estimate="variance") %>%
cbind(.,
mod.iid.nlme.VC %>% dplyr::select(Variance) %>% rename(nlme=Variance),
mod.iid.nlme.VC %>% mutate(lme4=NA) %>% dplyr::select(lme4),
mod.iid.glmm.VC %>% filter(effect=="Intercept") %>% dplyr::select(variance) %>% rename(glmmTMB=variance),
mod.iid.somm.VC %>% dplyr::select(VarComp) %>% rename(sommer=VarComp),
sasvc %>% filter(mod=="iid") %>% dplyr::select(Estimate) %>% rename(SAS=Estimate)) %>%
magrittr::set_rownames(NULL) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
**mod.ar1**
```{r, echo=FALSE}
tibble(mod=c("ar1", "ar1"), estimate=c("variance", "correlation")) %>%
cbind(.,
mod.ar1.nlme.VC %>% dplyr::select(Variance, rho) %>%
pivot_longer(names_to = "estimate", values_to = "nlme", cols = 1:2) %>%
dplyr::select(nlme),
tibble(lme4=c(NA, NA)),
tibble(glmmTMB=
c(mod.ar1.glmm.VC %>%
pluck("Variance Components") %>%
filter(effect=="factweek1") %>%
dplyr::select(variance),
mod.ar1.glmm.VC %>%
pluck("Cor") %>% pluck("plot") %>%
pluck(2,1))%>% unlist),
tibble(sommer=c(NA, NA)),
sasvc %>%
arrange(desc(CovParm)) %>%
filter(mod=="ar1") %>%
dplyr::select(Estimate) %>%
rename(SAS=Estimate)) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
## AIC
```{r, echo=FALSE}
tibble(mod=c("iid", "ar1"), estimate=c("AIC", "AIC")) %>%
cbind(.,
AIC.nlme %>% dplyr::select(AICc) %>% rename(nlme=AICc),
tibble(lme4=c(NA, NA)),
AIC.glmm %>% dplyr::select(AICc) %>% rename(glmmTMB=AICc),
tibble(sommer=c(AIC.somm %>% filter(Modnames=="iid") %>% pull(AIC), NA)),
sasaic %>% dplyr::select(AIC) %>% rename(SAS=AIC)) %>%
kable(escape = FALSE) %>%
kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
full_width = FALSE)
```
# More on this
[Relevant post by Mixed Model Academy](https://www.linkedin.com/posts/mixed-model-academy_pain-dataanalysis-activity-6684009897169833984-kZdn/){target="_blank"}