🗓️ 27-28 January 2020 ⏰ 09:00 - 17:00 🏨 Imperial B (Ballroom Level) ✍️ rstd.io/conf
- Download
tourism.xlsx
fromhttp://robjhyndman.com/data/tourism.xlsx
, and read it into R usingread_excel()
from thereadxl
package. - Create a tsibble which is identical to the
tourism
tsibble from thetsibble
package. - Find what combination of
Region
andPurpose
had the maximum number of overnight trips on average. - Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
- Create time plots of the following four time series:
Bricks
fromaus_production
,Lynx
frompelt
,Close
fromgafa_stock
,Demand
fromvic_elec
. - Use
help()
to find out about the data in each series. - For the last plot, modify the axis labels and title.
-
Look at the quarterly tourism data for the Snowy Mountains
snowy <- tourism %>% filter(Region == "Snowy Mountains")
- Use
autoplot()
,gg_season()
andgg_subseries()
to explore the data. - What do you learn?
- Use
-
Produce a calendar plot for the
pedestrian
data from one location and one year.
We have introduced the following functions: gg_lag
and ACF
. Use these functions to explore the four time series: Bricks
from aus_production
, Lynx
from pelt
, Close
price of Amazon from gafa_stock
, Demand
from vic_elec
. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
You can compute the daily changes in the Google stock price in 2018 using
dgoog <- gafa_stock %>%
filter(Symbol == "GOOG", year(Date) >= 2018) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index=trading_day, regular=TRUE) %>%
mutate(diff = difference(Close))
Does diff
look like white noise?
Consider the GDP information in global_economy
. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
-
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
- United States GDP from
global_economy
- Slaughter of Victorian “Bulls, bullocks and steers” in
aus_livestock
- Victorian Electricity Demand from
vic_elec
. - Gas production from
aus_production
- United States GDP from
-
Why is a Box-Cox transformation unhelpful for the
canadian_gas
data?
-
Produce the following decomposition
canadian_gas %>% STL(Volume ~ season(window=7) + trend(window=11)) %>% autoplot()
-
What happens as you change the values of the two
window
arguments? -
How does the seasonal shape change over time? [Hint: Try plotting the seasonal component using
gg_season
.] -
Can you produce a plausible seasonally adjusted series? [Hint:
season_adjust
is one of the variables returned bySTL
.]
- Use
GGally::ggpairs()
to look at the relationships between the STL-based features. You might wish to changeseasonal_peak_year
andseasonal_trough_year
to factors. - Which is the peak quarter for holidays in each state?
- Use a feature-based approach to look for outlying series in
PBS
. - What is unusual about the series you identify as "outliers".
- Produce forecasts using an appropriate benchmark method for household wealth (
hh_budget
). Plot the results usingautoplot()
. - Produce forecasts using an appropriate benchmark method for Australian takeaway food turnover (
aus_retail
). Plot the results usingautoplot()
.
- Compute seasonal naïve forecasts for quarterly Australian beer production.
- Test if the residuals are white noise. What do you conclude?
- Create a training set for household wealth (
hh_budget
) by witholding the last four years as a test set. - Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set.
- Compute the accuracy of your forecasts. Which method does best?
- Repeat the exercise using the Australian takeaway food turnover data (
aus_retail
) with a test set of four years.
Try forecasting the Chinese GDP from the global_economy
data set using an ETS model.
Experiment with the various options in the ETS()
function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.
[Hint: use h=20
when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]
Find an ETS model for the Gas data from aus_production
and forecast the next few years.
- Why is multiplicative seasonality necessary here?
- Experiment with making the trend damped. Does it improve the forecasts?
For the United States GDP data (from global_economy
):
- Fit a suitable ARIMA model for the logged data.
- Produce forecasts of your fitted model. Do the forecasts look reasonable?
For the Australian tourism data (from tourism
):
- Fit a suitable ARIMA model for all data.
- Produce forecasts of your fitted models.
- Check the forecasts for the "Snowy Mountains" and "Melbourne" regions. Do they look reasonable?
Repeat the daily electricity example, but instead of using a quadratic function of temperature, use a piecewise linear function with the "knot" around 20 degrees Celsius (use predictors Temperature
& Temp2
). How can you optimize the choice of knot?
The data can be created as follows.
vic_elec_daily <- vic_elec %>%
filter(year(Time) == 2014) %>%
index_by(Date = date(Time)) %>%
summarise(
Demand = sum(Demand)/1e3,
Temperature = max(Temperature),
Holiday = any(Holiday)) %>%
mutate(
Temp2 = I(pmax(Temperature-20,0)),
Day_Type = case_when(
Holiday ~ "Holiday",
wday(Date) %in% 2:6 ~ "Weekday",
TRUE ~ "Weekend"))
Repeat Lab Session 16 but using all available data, and handling the annual seasonality using Fourier terms.
- Prepare aggregations of the PBS data by Concession, Type, and ATC1.
- Use forecast reconciliation with the PBS data, using ETS, ARIMA and SNAIVE models, applied to all but the last 3 years of data.
- Which type of model works best?
- Does the reconciliation improve the forecast accuracy?
- Why doesn't the reconcililation make any difference to the SNAIVE forecasts?
This work is licensed under a Creative Commons Attribution 4.0 International License.