Time series analysis and forecasting are standard goals in applied ecology. In this course, you will learn how to wrangle, visualise and explore ecological time series. You will also learn to use the {mvgam
} package to analyse a diversity of ecological time series to gain useful insights and produce accurate forecasts.
This workshop will cover:
- Introduction to time series and time series visualization
- Generalised LInear Models (GLMs) and hierarchical models (GLMMs)
- Generalized Additive Modela (GAMs) for nonlinear effects and complex random effects
- Dynamic GLMs and Dynamic GAMs
- Multivariate modeling strategies
- Forecasting and forecast evaluation
This workshop is aimed at higher degree and early career ecologists who are interested in making better predictions with their statistical models. The strategies to be covered are extendable well beyond time series and participants will leave this workshop with a better understanding of strategies to capture the types of complex, nonlinear effects that dominate ecological data.
- Understand how dynamic GLMs and GAMs work to capture both nonlinear covariate effects and temporal dependence
- Be able to fit dynamic GLMs and GAMs in R using the {
mvgam
} package - Understand how to critique, visualize and compare fitted dynamic models
- Know how to produce forecasts from dynamic models and evaluate their accuracies using probabilistic scoring rules
Please be sure to have at least version 4.2 — and preferably version 4.3 or above — of R
installed. Note that R
and RStudio
are two different things: it is not sufficient to just update RStudio
, you also need to update R
by installing new versions as they are released.
To download R
go to the CRAN Download page and follow the links to download R
for your operating system:
To check what version of R
you have installed, you can run
version
in R
and look at the version.string
entry (or the major
and minor
entries).
We will make use of several R
packages that you'll need to have installed. Prior to the start of the course, please run the following code to update your installed packages and then install the required packages:
# update any installed R packages
update.packages(ask = FALSE, checkBuilt = TRUE)
# packages to install for the course
pkgs <- c("tidyverse", "gratia", "ggplot2",
"marginaleffects", "janitor"
# install packages
install.packages(pkgs)
RTools or another form of compiler is needed to build compiled packages, particularly if you are on a Windows machine. Visit the Rtools installation site for more details.
We will be using the latest development version of mvgam
, as it has several nice features that haven't quite made it to CRAN yet. You can install this version using:
devtools::install_github("nicholasjclark/mvgam")
If the above fails, you can just as easily stick with the CRAN version
install.packages('mvgam')
When working in R, there are two primary interfaces we can use to fit models with Stan (rstan
and CmdStan
). Either interface will work, however it is highly recommended that you use the Cmdstan
backend, with the {cmdstanr}
interface, rather than using {rstan}
. More care, however, needs to be taken to ensure you have an up to date version of Stan. For all mvgam
functionalities to work properly, please ensure you have at least version 2.29 of Stan installed. The GitHub development versions of rstan
and CmdStan
are currently several versions ahead of this, and both of these development versions are stable. The exact version you have installed can be checked using either rstan::stan_version()
or cmdstanr::cmdstan_version()
Compiling a Stan program requires a modern C++ compiler and the GNU Make build utility (a.k.a. "gmake"). The correct versions of these tools to use will vary by operating system, but unfortunately most standard Windows and MacOS X machines do not come with them installed by default. The first step to installing Stan is to update your C++ toolchain so that you can compile models correctly. There are detailed instructions by the Stan team on how to ensure you have the correct C++ toolchain to compile models, so please refer to those and follow the steps that are relevant to your own machine. Once you have the correct C++ toolchain, you'll need to install Cmdstan
and the relevant R
package interface. First install the R
package by running the following command in a fresh R
environment:
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
{cmdstanr}
requires a working installation of CmdStan, the shell interface to Stan. If you don't have CmdStan installed then {cmdstanr}
can install it for you, assuming you have a suitable C++ toolchain. To double check that your toolchain is set up properly you can call
the check_cmdstan_toolchain()
function:
library(cmdstanr)
check_cmdstan_toolchain()
If your toolchain is configured correctly then CmdStan can be installed by calling the
install_cmdstan()
function:
install_cmdstan(cores = 2)
You should now be able to follow the remaining instructions on the Getting Started with CmdStanR page to ensure that Stan models can successfully compile on your machine. A quick way to check this would be to run this script:
library(mvgam)
simdat <- sim_mvgam()
mod <- mvgam(y ~ s(season, bs = 'cc', k = 5) +
s(time, series, bs = 'fs', k = 8),
data = simdat$data_train)
But issues can sometimes occur when:
- you don't have write access to the folders that CmdStan uses to create model executables
- you are using a university- or company-imposed syncing system such as One Drive, leading to confusion about where your make file and compilers are located
- you are using a university- or company-imposed firewall that is aggressively deleting the temporary executable files that CmdStan creates when compiling
If you run into any of these issues, it is best to consult with your IT department for support.
🔎 Live questions and code sharing
Use this link to access a live Google Doc where we can host questions, relevant R
code snippets and links of interest during the workshop. Please be aware that anything you post here will be accessible by all workshop participants, so be kind and inclusive 😄
💹 Lecture slides
We will begin the workshop by working through a bit of introductory material, which you can follow along with by working through the html slidedeck
💻 Live code example 1
The first example will use a collection of time series of annual, observer-adjusted American kestrel counts
💻 Live code example 2
The second example (time permitting) will analyse a set of temporal experimental data monitoring aphid abundance in crop plots
💹 Introductory webinar on mvgam
This recorded webinar goes into some of the basic functionality of the package, using a simple example to get started
A series of vignettes cover data formatting, forecasting and several extended case studies of DGAMs. A number of other examples have also been compiled:
- Ecological Forecasting with Dynamic Generalized Additive Models
- Distributed lags (and hierarchical distributed lags)
using
mgcv
andmvgam
- State-Space Vector Autoregressions in
mvgam
- Ecological Forecasting with Dynamic GAMs; a tutorial and detailed case study
- How to interpret and report nonlinear effects from Generalized Additive Models
- Introduction to Stan and Hamiltonian Monte Carlo
- Phylogenetic smoothing using
mgcv
- Incorporating time-varying seasonality in forecast models