autoregressive_time_series.Rmd

---
title: "Repeated Measures in Time"
output: 
  html_document:
    includes:
      in_header: header.html
      after_body: footer.html	
params:
  hilang:
    - sas					 
---

```{r, echo=FALSE, message=FALSE, warning=FALSE, purl=FALSE}
# markdown options
options(knitr.kable.NA = '') # show NA as empty cells in output tables
pacman::p_load(kableExtra, formattable, htmltools) # packages for better formatting tables for html output
```

```{r, child="_hilang_setup.Rmd", purl=FALSE}
```

```{r, message = FALSE, error = FALSE, warning = FALSE}
# packages
pacman::p_load(conflicted, # package function conflicts
               dplyr, purrr, tibble, tidyr, stringr, # data handling
               gganimate, ggplot2, gifski, viridis,  # plot
               nlme, lme4, glmmTMB, sommer, # mixed modelling
               AICcmodavg, mixedup)         # mixed model extractions

# package function conflicts
conflict_prefer("filter", "dplyr")

# data
dat <- agriTutorial::sorghum %>%
  rename(block = Replicate, plot = factplot) %>% 
  dplyr::select(y, variety, block, plot, factweek, varweek) %>% 
  as_tibble()
```
```{r, echo=FALSE, purl=FALSE}
dat %>%
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE) %>% 
  scroll_box(height = "200px")
```

<br/>

# Motivation

The example in this chapter is taken from [*Example 4* in Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} (see also the [Agritutorial vigniette](https://cran.r-project.org/web/packages/agriTutorial/vignettes/agriTutorialVignette.pdf#%5B%7B%22num%22%3A81%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C28.346%2C813.543%2Cnull%5D){target="_blank"}). It considers data from a sorghum trial laid out as a randomized complete `block` design (5 blocks) with `variety` (4 sorghum varities) being the only treatment factor. Thus, we have a total of 20 `plot`s. It is important to note that our response variable (`y`), **the leaf area index, was assessed in five consecutive weeks on each plot** starting 2 weeks after emergence. Therefore, the dataset contains a total of 100 values and what we have here is longitudinal data, *a.k.a.* repeated measurements over time, *a.k.a.* a time series analysis.

As [Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} put it: *"the week factor is not a treatment factor that can be randomized. Instead, repeated measurements are taken on each plot on five consecutive occasions. Successive measurements on the same plot are likely to be serially correlated, and this means that for a reliable and efficient analysis of repeated-measures data we need to take proper account of the serial correlations between the repeated measures ([Piepho, Büchse & Richter, 2004](https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1439-037X.2004.00097.x){target="_blank"}; [Pinheiro & Bates, 2000](http://library.mpib-berlin.mpg.de/toc/z2008_18.pdf){target="_blank"})."*


```{r, class.source = "fold-hide", purl = FALSE, message = FALSE, warning = FALSE}
gganimate_plot <- ggplot(data = dat, aes(
  y = y,
  x = varweek,
  group = variety,
  color = variety
)) +
  geom_boxplot(outlier.shape = NA) +
  geom_point(alpha = 0.5, size = 3) +
  scale_color_viridis(option = "D",
                      discrete = TRUE,
                      name = "Variety") +
  scale_y_continuous(
    name = "Leaf area index",
    limits = c(0, 6.5),
    expand = c(0, 0),
    breaks = c(0:6)
  ) +
  xlab("Week") +
  theme_bw() +
  theme(legend.position = "bottom") +
  transition_time(varweek) +
  shadow_mark(exclude_layer = 2)

animate(gganimate_plot, renderer = gifski_renderer()) # render gif
```

# Modelling

Our goal is therefore to build a suitable model taking serial correlation into account. In order to do this, we will initially consider the model for a single time point. Then, we extend this model to account for multiple weeks by allowing for week-speficic effects. Finally, we further allow for serially correlated error terms. 

# Single week

When looking at data from a single time point (*e.g.* the first week), we merely have 20 observations from a randomized complete block design with a single treatment factor. It can therefore be analyzed with a simple one-way ANOVA (fixed `variety` effect) for randomized complete block designs (fixed `block` effect):

```{r}
dat.wk1 <- dat %>% filter(factweek == "1") # subset data from first week only

mod.wk1 <- lm(formula = y ~ variety + block,
              data = dat.wk1)
```

We could now go on and look at the ANOVA via `anova(mod.wk1)` and it would indeed not be _wrong_ to simply repeat this for each week. Yet, one may not be satisfied with obtaining multiple ANOVA results - namely one per week. This is especially likeliy in case the results contradict each other, because *e.g.* the variety effects are found to be significant in only two out of five weeks. Therefore, one may want to analyze the entire dataset *i.e.* the multiple weeks jointly.

# Multiple weeks - independent errors {.tabset .tabset-fade .tabset-pills}

Going from the single-week-analysis to jointly analyzing the entire dataset is more than just changing the `data =` statement in the model. This is because *"it is realistic to assume that the treatment effects evolve over time and thus are week-specific. Importantly, we must also allow for the block effects to change over time in an individual manner. For example, there could be fertility or soil type differences between blocks and these could have a smooth progressive or cumulative time-based effect on differences between the blocks dependent on factors such as temperature or rainfall"* [(Piepho & Edmondson, 2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"}. We implement this by taking the model in `mod.wk1` and multipyling each effect with `factweek`. Note that this is also true for the general interecept (µ) in `mod.wk1`, meaning that we would like to include one intercept per week, which can be achieved by simply adding `factweek` as a main effect as well. This leaves us with fixed main effects for `factweek`, `variety`, and `block`, as well as the week-specific effects of the latter two `factweek:variety` and `factweek:block`.

Finally, note that we are not doing anything about the model's error term at this point. More specifically this means that its variance structure is still the default **iid** (independent and identically distributed) - see [our summary on correlation/variance strucutres here](variance_structures.html){target="_blank"}.

## nlme

Since the models in this chapter do not contain any random effects, we make use of `gls()` instead of `lme()`. Furthermore, the above named model `factweek + variety + block + factweek:variety + factweek:block` can be written in a shorter syntax as:

```{r}
mod.iid.nlme <- nlme::gls(model = y ~ factweek * (variety + block),
                          correlation = NULL, # default, i.e. homoscedastic, independent errors
                          data = dat)

# Extract variance component estimates
mod.iid.nlme.VC <- tibble(varstruct = "iid") %>% 
  mutate(sigma    = mod.iid.nlme$sigma) %>% 
  mutate(Variance = sigma^2)
```
```{r, echo=FALSE, purl=FALSE}
mod.iid.nlme.VC %>%
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

## lme4

Since the models in this chapter do not contain any random effects, we cannot use `lmer()` or any other function of the `lme4` package. However, even if there were random effects in our models, the short answer here is that with `lme4` it is **not possible** to fit any variance structures.

More specifically, we can read in an [`lme4` vigniette](https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf){target="_blank"}: *"The main advantage of `nlme` relative to `lme4` is a user interface for fitting models with structure in the residuals (various forms of heteroscedasticity and autocorrelation) and in the random-effects covariance matrices (e.g., compound symmetric models).  With some extra effort, the computational machinery of `lme4` can be used to fit structured models that the basic `lmer` function cannot handle (see [Appendix A](https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf#%5B%7B%22num%22%3A15%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C81%2C733.028%2Cnull%5D){target="_blank"})"*

Michael Clark [puts it as](https://m-clark.github.io/mixed-models-with-R/extensions.html#heterogeneous-variance){target="_blank"} *"Unfortunately, lme4 does not provide the ability to model the residual covariance structure, at least [not in a straightforward fashion](https://bbolker.github.io/mixedmodels-misc/notes/corr_braindump.html){target="_blank"}"*

<span style="color:red">Accordingly, there is no info on this package for this chapter beyond this point.</span>

<br/>

## glmmTMB

In `glmmTMB()` it is -to our knowledge- not possible to adjust the variance structure of the error.

>Just like in `nlme`, there is a `weights=` argument in `glmmTMB()`. However, to our understanding, they have different functions: 
> 
>In `nlme`, it requires *"an optional `varFunc` object or one-sided formula describing the within-group heteroscedasticity structure"* [(nlme RefMan)](https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf#Rfn.glmmTMB.1){target="_blank"} and we make use of this in the chapter at hand.   
>
>In `glmmTMB`, the [RefMan](https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf#Rfn.glmmTMB.1){target="_blank"} only states *"weights, as in `glm`. Not automatically scaled to have sum 1"*. Following this trail, the [`glm` documentation](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html){target="_blank"} description for the `weights=` argument is *"an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector."*.
> Accordingly, it cannot be used to allow for heterogeneous error variances in this package.

We can, however, "*fix the residual variance  to  be  0  (actually  a  small  non-zero  value)*" and therefore "*force variance into the random effects*" [(glmmTMB RefMan)](https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf#Rfn.glmmTMB.1){target="_blank"} via adding the `dispformula = ~ 0` argument. Thus, when doing so, we need to make sure to also add a random term to the model with the desired variance structure. By taking both of these actions, we are essentially mimicing the error (variance) as a random effect (variance). We achieve this by first creating a `unit` column in the data with different entries for each data point. 

Note that doing this may not be necessary for `mod.iid.glmm` with the default homoscedastic, independent error variance structure, but since it will be necessary for the following model with a more sophisticated variance structure, we will apply it here, too.

Finally, for extracting the variance component estimates from `glmmTMB` objects in this chapter, we use the `extract_vc` function from the 
[`mixedup` helper package](https://github.com/m-clark/mixedup){target="_blank"}. 

```{r}
dat <- dat %>%
  mutate(unit = 1:n() %>% as.factor) # new column with running number

mod.iid.glmm <- glmmTMB(formula = y ~ factweek * (variety + block) 
                        + (1 | unit),      # add random unit term to mimic error variance
                        dispformula = ~ 0, # fix original error variance to 0
                        REML = TRUE,       # needs to be stated since default = ML
                        data = dat) 

# Extract variance component estimates
mod.iid.glmm.VC <- mod.iid.glmm %>%
  extract_vc(ci_scale = "var")
```
```{r, echo=FALSE, purl=FALSE}
mod.iid.glmm.VC %>%
  dplyr::select(-sd, -var_prop) %>% 
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

## sommer

It is quite easy to extract variance component estimates from `sommer` objects in a table format we are used to:

```{r}
mod.iid.somm <- mmer(fixed = y ~ factweek + variety + block + factweek:variety + factweek:block, 
                     rcov  = ~ units, # default = iid
                     data  = dat, verbose=F)

# Extract variance component estimates
mod.iid.somm.VC <- summary(mod.iid.somm)$varcomp 
```
```{r, echo=FALSE, purl=FALSE}
mod.iid.somm.VC %>%
  as_tibble(rownames="grp") %>% 
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

## SAS

In SAS, there is an actual `subject=` statement that is used to identify the subjects on which repeated measurements are taken. This may not be necessary for the `iid` with the default homoscedastic, independent error variance structure, but since it will be necessary for the following model with a more sophisticated variance structure, we will apply it here, too. In the end, the entire `repeated` line below could also be left out as it simply explicitly states the default setting. 

```{r, eval=FALSE, hilang="sas", purl=FALSE}
proc mixed data=dat;
class variety block factweek plot;
model y=block variety factweek block*factweek variety*factweek /ddfm=kr2;
repeated factweek/sub=plot type=VC rcorr; /* = default iid */
ods output covparms=modiidsasVC FitStatistics=modiidsasAIC;
run; 

proc print data=modiidsasVC;
run;
```

<br/>

```{r, echo=FALSE, message=FALSE,}
sasvc <- readr::read_delim("SAS/autoregressive_time_series_results_VC.txt", delim="\t")

sasvciid <- sasvc %>%
  filter(mod=="iid")
```

```{r, echo=FALSE, message=FALSE, purl=FALSE}
sasvciid %>%
  dplyr::select(mod, everything()) %>% 
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

# Multiple weeks - autocorrelated errors {.tabset .tabset-fade .tabset-pills}

Note that at this point of the analysis, the model above with an independent, homogeneous error term above is neither the right, nor the wrong choice. It must be clear that *"measurements on the same plot are **likely** to be serially correlated"*. Thus, it should be investigated whether any covariance structure for the error term (instead of the default independence between errors) is more appropriate to model this dataset. It could theoretically be the case that this `mod.iid` is the best choice here, but we cannot confirm this yet, as we have not looked at any alternatives. This is what we will do in the next step.

> One may ask **at what step of the analysis it is best to compare and find the appropriate covariance structure for the error term** in such a scenario. It is indeed here, at this step. As [Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} write: *"**Before modelling the treatment effect**, a variance–covariance model needs to be identified for these correlations. This is best done by using a saturated model for treatments and time, i.e., a model that considers all treatment factors and time as qualitative."* 
> 
> <br/>
>
> Note that this *saturated model* is what we have in `mod.iid`. So in short: one should compare variance structures for the error term **before** running an ANOVA / conducting model selection steps. 

Units on which repeated observations are taken are often referred to as subjects. We would now like to allow measurements taken on the same subjects (plot in this case) to be serially correlated, while observations on different subjects are still considered independent. More specifically, we want the errors of the respective observations to be correlated in our model. 

<img src="img\corrvalues.PNG" style="width:60%; margin-right: 10px" align="left">

Take the visualisation on the left depicting a subset of our data. Here, you can see plots 1 and 2 (out of the total of 20 in our dataset) side by side. Furthermore, they are shown for weeks 1-3 (out of the total of 5 in our dataset). Thus, a total of six values are represented, coming from only two subjects/plots, but obtained in three different weeks. The blue arrows represent correlation among errors. For these six measurements, there are six blue arrows, since there are six error pairs that come from the same plot, respectively. The green lines on the other hand represent error pairs that do not come from the same plot and thus are assumed to be independent. Finally, notice that the errors coming from the same plot, but with two instead of just one week between their measurements, are shown in a lighter blue. This is because one may indeed assume a weaker correlation between errors that are further apart in terms of time passed between measurements.

The latter can be achieved by using the maybe most popular correlation structure for repeated measures over time: **first order autoregressive AR(1)**. Please find the section on AR(1) in [our summary on correlation/variance strucutres here.](variance_structures.html){target="_blank"} It should be noted that this correlation structure is useful, if all time points are equally spaced, which is the case here, as there is always exactly one week between consecutive time points.

> There are other possible models (*i.e.* [other variance structures](variance_structures.html){target="_blank"}) for serial correlation of longitudinal data, but for simplicity we will only compare the default `iid` model above with the `ar1` model below. In fact, [*Example 4* in Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} does investigate other models as well and you can find their `nlme` code in the [Agritutorial vigniette](https://cran.r-project.org/web/packages/agriTutorial/vignettes/agriTutorialVignette.pdf#%5B%7B%22num%22%3A81%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C28.346%2C813.543%2Cnull%5D){target="_blank"}.

## nlme

In `nlme` we make use of the `correlation =` argument and use the `corAR1` [correlation strucutre class](https://cran.r-project.org/web/packages/nlme/nlme.pdf#Rfn.corClasses.1){target="_blank"}. In its syntax, subjects are identified after the `|`.

```{r}
mod.ar1.nlme <- nlme::gls(model = y ~ factweek * (variety + block),
                          correlation = corAR1(form = ~ varweek | plot),
                          data = dat)

# Extract variance component estimates
mod.ar1.nlme.VC <- tibble(varstruct = "ar(1)") %>%
  mutate(sigma    = mod.ar1.nlme$sigma,
         rho      = coef(mod.ar1.nlme$modelStruct$corStruct, unconstrained = FALSE)) %>%
  mutate(Variance = sigma^2,
         Corr1wk  = rho,
         Corr2wks = rho^2,
         Corr3wks = rho^3,
         Corr4wks = rho^4)
```
```{r, echo=FALSE, purl=FALSE}
mod.ar1.nlme.VC %>%
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

## lme4

<span style="color:red">not possible - see above.</span>

## glmmTMB

To model the variance structure of our mimiced error term as first order autoregressive, we replace the `(1 | unit)` from the `iid` model with `ar1(factweek + 0 | plot)`. As can be seen, subjects are identified after the `|` and the variance structure in this syntax. Further notice that leaving out the `+ 0` would by default lead to estimating an additional overall variance (see *Details* for [`glmmTMB` topic in the RefMan](https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf#Rfn.glmmTMB.1){target="_blank"}).

Note that by adding the `show_cor = TRUE` argument to the `extract_vc` function, we obtain two outputs: The variance component estimates, as seen above for the `iid` model, as well as the correlation matrix / variance structure for a single plot.

```{r}
mod.ar1.glmm <- glmmTMB(formula = y ~ factweek * (variety + block)
                        + ar1(factweek + 0 | plot), # add ar1 structure as random term to mimic error variance
                        dispformula = ~ 0, # fix original error variance to 0
                        REML = TRUE,       # needs to be stated since default = ML
                        data = dat) 

# Extract variance component estimates
mod.ar1.glmm.VC <- mod.ar1.glmm %>%
  extract_vc(ci_scale = "var", show_cor = TRUE)
```
```{r, echo=FALSE, purl=FALSE}
mod.ar1.glmm.VC %>%
  pluck("Variance Components") %>% 
  dplyr::select(-sd, -var_prop) %>% 
  mutate_if(is.double, round, 3) %>%
  kable(escape = FALSE) %>%
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
                full_width = FALSE)

mod.ar1.glmm.VC %>%
  pluck("Cor") %>% pluck("plot") %>% 
  as_tibble(rownames = "cor") %>% 
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>%
  column_spec(1, bold = T) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"),
                full_width = FALSE)
```

## sommer

Trying to use sommer here leads to making two sacrifices and as a result it can be argued whether this analysis with sommer is still what we are aiming for in this specific example. While yes, there is an [AR1 function in sommer](https://cran.r-project.org/web/packages/sommer/sommer.pdf#Rfn.AR1.1){target="_blank"}, the two major limitations with it regarding this scenario are:

* It cannot be used for the error term (*i.e.* in the `rcov=` argument), but must be applied for a random effect.
* A fixed value for rho must actually be provided by the user via the `rho=` argument.

The first point would not be such a big deal, if we could force the error variance to be 0. Then, the random term with the desired ar1-structure would simply mimic the error term, which is exactly what we do with `glmmTMB` in this chapter! However, since `sommer` does not allow us to force the error variance to be 0, we get an iid error variance **and** the desired ar1-structure for plots across weeks. Interestingly, [Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} actually fit this model as well in addition to the ar1 model and refer to it as "ar1 + nugget". Strictly speaking, however, it simply is not the variance structure we were aiming for here.

The second point is simply a limitation in terms of finding the optimal parameters. Since rho is fixed to a value that we as the user must provide before the model iterates, the fitted model can only be as good as our provided rho.

```{r}
fixed.rho <- 0.7 # random guess

mod.ar1.somm <- mmer(fixed  = y ~ factweek + variety + block + factweek:variety + factweek:block, 
                     random = ~ vs(factweek, Gu = AR1(factweek, rho = fixed.rho)),
                     rcov   = ~ units, # default = iid
                     data   = dat, verbose=F)

# Extract variance component estimates
mod.ar1.somm.VC <- summary(mod.ar1.somm)$varcomp 
```
```{r, echo=FALSE, purl=FALSE}
mod.ar1.somm.VC %>%
  as_tibble(rownames="grp") %>% 
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

Note that rho is not reported as part of the variance component estimates - which makes sense, as it was not estimated.

## SAS

Here, we write `type=ar(1)` in the `repeated` statement in order to get the desired variance structure for the error term. 

```{r, eval=FALSE, hilang="sas", purl=FALSE}
proc mixed data=dat;
class variety block factweek plot;
model y=block variety factweek block*factweek variety*factweek /ddfm=kr2;
repeated factweek/sub=plot type=ar(1) rcorr; /* ar1 variance structure */
ods output covparms=modar1sasVC FitStatistics=modar1sasAIC;
run; 

proc print data=modar1sasVC;
run;
```

<br/>

```{r, echo=FALSE, message=FALSE,}
sasvcar1 <- sasvc %>%
  filter(mod=="ar1")
```

```{r, echo=FALSE, message=FALSE, purl=FALSE}
sasvcar1 %>% 
  dplyr::select(mod, everything()) %>% 
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

# Model fit {.tabset .tabset-fade .tabset-pills}

After all, a decision must be made on whether the `ar1` variance structure for the error term was a good choice for modelling this dataset. More broadly speaking, it must be decided which of the models should be used for further investigations such as a test of fixed effects *etc*. As is standard procedure, we can do a model selection based on goodness-of-fit statistics such as the [AIC](https://www.wikiwand.com/en/Akaike_information_criterion#){target="_blank"} (Akaike information criterion).

For `nlme` and `glmmTMB`, we make use of the [`aictab()` function](https://cran.r-project.org/web/packages/AICcmodavg/AICcmodavg.pdf#Rfn.aictab.1){target="_blank"} from the helper package [AICcmodavg](https://CRAN.R-project.org/package=AICcmodavg){target="_blank"}.

## nlme

```{r, warning=FALSE}
AIC.nlme <- aictab(list(mod.iid.nlme, mod.ar1.nlme), modnames = c("iid", "ar1"))
```
```{r, echo=FALSE, purl=FALSE}
AIC.nlme %>%  
  dplyr::select(Modnames, K, AICc, Delta_AICc, Res.LL) %>%  
  mutate_at(vars(AICc:Res.LL), ~round(., 1)) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

## lme4

<span style="color:red">not possible - see above.</span>

## glmmTMB

```{r, warning=FALSE}
AIC.glmm <- aictab(list(mod.iid.glmm, mod.ar1.glmm), modnames = c("iid", "ar1"))
```
```{r, echo=FALSE, purl=FALSE}
AIC.glmm %>%  
  dplyr::select(Modnames, K, AICc, Delta_AICc, LL) %>%  
  mutate_at(vars(AICc:LL), ~round(., 1)) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

## sommer

Comparing AIC values from the two models once again shows that `sommer` does not really do what we are aiming for in this example. The AIC values for the two models are identical. The main reason for this is that they do not actually differ in the number of estimated parameters, since *rho* as the potentially second parameter here, was not estimated, but fixed by us.

[ Is this the whole story? ...in progress... ]

```{r}
somm.mods <- list(mod.iid.somm, mod.ar1.somm)

AIC.somm <- tibble(
  Modnames = c("iid", "ar1"),
  AIC = somm.mods %>% map("AIC") %>% unlist,
  LL  = somm.mods %>% map("monitor") %>% lapply(. %>% `[`(1, ncol(.))) %>% unlist) %>% # last element in first row of "monitor"
  arrange(AIC)
```
```{r, echo=FALSE, purl=FALSE}
AIC.somm %>%  
  mutate_at(vars(AIC), ~round(., 1)) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

## SAS

The model fit statistics have already been created via the `ods output` statement of the respective models and are displayed here jointly:

```{r, echo=FALSE, message=FALSE}
sasaic <- readr::read_delim("SAS/autoregressive_time_series_results_AIC.txt", delim="\t") %>% 
  mutate(Descr = str_remove(Descr, "\\s*\\([^\\)]+\\)") %>% str_trim) %>% 
  pivot_wider(names_from=Descr, values_from=Value)
```

```{r, echo=FALSE, message=FALSE, purl=FALSE}
sasaic %>%
  dplyr::select(mod, everything()) %>% 
  mutate_if(is.double, round, 3) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

# Conclusion

The `ar1` model has the smaller AIC value. Thus, we choose the `ar1` over the `iid` model, since the former has the better fit.

Notice that this outcome satisfies the aim of this chapter (*i.e.* having an `ar1` variance structure for the error term) but regarding the analysis of the underlying dataset, more steps can be taken in order to get concluding results. You can find further steps in [*Example 4* in Piepho & Edmondson (2018)](https://onlinelibrary.wiley.com/doi/full/10.1111/jac.12267){target="_blank"} (see also the [Agritutorial vigniette](https://cran.r-project.org/web/packages/agriTutorial/vignettes/agriTutorialVignette.pdf#%5B%7B%22num%22%3A81%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C28.346%2C813.543%2Cnull%5D){target="_blank"}). On one hand, they investigate more variance structure alternatives to `iid` and on the other hand, they continue selecting a suitable regression model for the time trend after selecting a sutiable variance structure.

# Summary {.tabset .tabset-fade}

## Syntax

package | model syntax
---|---
`nlme` | `correlation = corAR1(form = ~ TIME | SUBJECT)`
`lme4` | <span style="color:red">not possible</span>
`glmmTMB` | `random = ~ ar1(TIME + 0 | SUBJECT), dispformula =  ~ 0`
`sommer` | `random = ~ vs(TIME, Gu = AR1(TIME, rho = FIXED.RHO))`
`SAS` | `repeated TIME/sub=SUBJECT type=ar(1);`

## VarComp

**mod.iid**
```{r, echo=FALSE}
tibble(mod="iid", estimate="variance") %>%
  cbind(., 
        mod.iid.nlme.VC %>% dplyr::select(Variance) %>% rename(nlme=Variance),
        mod.iid.nlme.VC %>% mutate(lme4=NA) %>% dplyr::select(lme4),
        mod.iid.glmm.VC %>% filter(effect=="Intercept") %>% dplyr::select(variance) %>% rename(glmmTMB=variance),
        mod.iid.somm.VC %>% dplyr::select(VarComp) %>% rename(sommer=VarComp),
        sasvc %>% filter(mod=="iid") %>% dplyr::select(Estimate) %>% rename(SAS=Estimate)) %>% 
  magrittr::set_rownames(NULL) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

**mod.ar1**
```{r, echo=FALSE}
tibble(mod=c("ar1", "ar1"), estimate=c("variance", "correlation")) %>%
  cbind(., 
        mod.ar1.nlme.VC %>% dplyr::select(Variance, rho) %>% 
          pivot_longer(names_to = "estimate", values_to = "nlme", cols = 1:2) %>% 
          dplyr::select(nlme),
        tibble(lme4=c(NA, NA)),
        tibble(glmmTMB=
                 c(mod.ar1.glmm.VC %>%
                     pluck("Variance Components") %>% 
                     filter(effect=="factweek1") %>% 
                     dplyr::select(variance), 
                   mod.ar1.glmm.VC %>%
                     pluck("Cor") %>% pluck("plot") %>%  
                     pluck(2,1))%>% unlist),
        tibble(sommer=c(NA, NA)),
        sasvc %>% 
          arrange(desc(CovParm)) %>% 
          filter(mod=="ar1") %>% 
          dplyr::select(Estimate) %>% 
          rename(SAS=Estimate)) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

## AIC

```{r, echo=FALSE}
tibble(mod=c("iid", "ar1"), estimate=c("AIC", "AIC")) %>%
  cbind(.,
        AIC.nlme %>% dplyr::select(AICc) %>% rename(nlme=AICc),
        tibble(lme4=c(NA, NA)),
        AIC.glmm %>% dplyr::select(AICc) %>% rename(glmmTMB=AICc),
        tibble(sommer=c(AIC.somm %>% filter(Modnames=="iid") %>% pull(AIC), NA)),
        sasaic %>% dplyr::select(AIC) %>% rename(SAS=AIC)) %>% 
  kable(escape = FALSE) %>% 
  kable_styling(bootstrap_options = c("bordered", "hover", "condensed", "responsive"), 
                full_width = FALSE)
```

# More on this 

[Relevant post by Mixed Model Academy](https://www.linkedin.com/posts/mixed-model-academy_pain-dataanalysis-activity-6684009897169833984-kZdn/){target="_blank"}