-
Notifications
You must be signed in to change notification settings - Fork 0
/
Financial_Activities_FRED_Forecast_Accuracy_Testing.rmd
286 lines (224 loc) · 10.7 KB
/
Financial_Activities_FRED_Forecast_Accuracy_Testing.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
---
title: "Financial Activities FRED: Forecast Accuracy Testing and Observations"
author: "David Pershall"
output:
pdf_document: default
urlcolor: blue
---
```{r setup, include=FALSE}
options(scipen = 999, digits = 15, max.print = 2000)
knitr::opts_chunk$set(echo = TRUE)
options(dplyr.summarise.inform = FALSE)
```
## Financial Activites
Every month the U.S. Bureau of Labor Statistics publishes employment figures for
large sections of the U.S. Economy. The task is to forecast the amount of people
employed in the financial activities sector for four years. This information is
publicly reported by the St. Louis FRED and can be found at the following link.
[https://fred.stlouisfed.org/series/USFIRE](https://fred.stlouisfed.org/series/USFIRE).
Four years represents a large amount of time and confidence windows will be wide.
Therefore, I will test a few forecasting models by comparing their respective
RMSE and Winkler scores on a known portion of the time series.
## Options, Libraries, and Pull
```{r options libraries and pull}
#Libraries
if(!require(pacman))
install.packages("pacman", repos = "http://cran.us.r-project.org")
pacman::p_load("tidyverse", "fpp3", "fable.prophet", "ggpubr")
#Pull
tmp <- tempfile()
download.file("https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1169&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=CEU5500000001&scale=left&cosd=2000-01-01&coed=2023-01-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2023-02-20&revision_date=2023-02-20&nd=2000-01-01", tmp)
```
## Transform
Now that we have the data pulled and stored, we need to transform it into a
a workable time series structure.
```{r transform}
actual <- read.csv(tmp, col.names = c("Date", "Employed")) %>%
mutate(Date = as.Date(Date),
# adjust the reported number to millions for easier reading
Employed = Employed/1000,
Month = yearmonth(Date)) %>%
dplyr::select(Month, Employed) %>% as_tsibble(index = Month)
```
## Exploration
```{r exploration}
autoplot(actual, Employed) +
labs(title = "US employment: Financial Activities",
y = "Number of people (millions)")
```
A few observations from this simple plot are that the data is non-stationary and
appears to have some seasonality. We can also observe the 2008-10 recession
as well as a sharp slump due to the outbreak of the Covid-19 pandemic in 2020.
## Seasonality
Let's take a closer look at the seasonality by plotting it.
```{r seasonality}
actual %>% gg_season(Employed, period = "year")
```
We can see some trends forming in the seasonal chart above. On average,
the months of May and June tend to see a rise in the number of people joining the
financial activities workforce, followed by a trimming in August. We also see the
sharp declines caused by the 2020 Covid-19 pandemic and the 2008-10 financial
crisis. These stand as the exception to the rule of generalized growth in the
rate of people employed in financial activities.
## The Split
The first task is to split the data so that we have a training set and a test
set to perform the accuracy tests against.
```{r split}
train <- actual %>%
filter(year(Month) <= 2018)
test <- actual %>%
filter(year(Month) >= 2019)
rm(actual)
gc()
```
## Differencing
Since the data is non stationary, let's go ahead and difference the data and run
a preliminary ACF and PACF analysis.
```{r Differenced, warning=FALSE}
train %>%
gg_tsdisplay(difference(Employed, 12),
plot_type = "partial", lag = 50) +
labs(title = "Seasonally differenced", y = "")
```
In the above seasonally differenced graph we see too many lines pass the 20%
correlation cut-off in the ACF and PACF plots. This suggests we should
difference the data again.
```{r Double Differenced, warning=FALSE}
train %>%
gg_tsdisplay(difference(difference(Employed,12)),
plot_type = "partial", lag = 50) +
labs(title = "Double differenced",
y = "")
```
In the double differenced graph we see that the series more closely resembles
white noise, and could be ready for the model fitting process.
## The fits
The goal is to find the most accurate model for forecasting this series based on
minimizing the RMSE and Winkler score. Since there are considerable spikes
remaining in the double differenced ACF and PACF plots, I will conduct
a search for the best ARIMA model and the best STLF model. This will take more
time than constructing one myself. However, I am only working with one
series, so the time spent will be worth while. I will also fit a Seasonal Naive
model, an ETS model, a Prophet Model, and one combination model.
```{r Fits, warning=FALSE}
STLF = decomposition_model(STL(Employed ~ season(window = Inf)),
ETS(season_adjust ~ season("N")))
fits <- train %>%
model(
stlf = STLF,
ets = ETS(Employed),
seasonal_naive_drift = SNAIVE(Employed ~ drift()),
prophet = prophet(Employed ~ season(period = 12, order = 2,
type = "multiplicative")),
arima_search = ARIMA(Employed, greedy = F, stepwise = FALSE, approx = FALSE),
) %>% mutate(combination = (ets + stlf + arima_search + seasonal_naive_drift) / 4)
fits
```
## Preliminary Plots
We can go ahead and plot the forecast means as a preliminary examination of the
models.
```{r preliminary plots}
Employment_forecasts <- fits %>%
forecast(h = "4 years")
Employment_forecasts %>%
autoplot(train,
level = NULL) +
labs(y = "Number of People (Millions)",
title = "US employment: Financial Activities")
```
## The residuals
Let's dig a little deeper in our exploration of the models. One way of comparing
and contrasting models is to plot their residuals. Since we are more focused on
RMSE and the Wilder score, I will only include the Arima search model and the
Prophet models as an example of this approach.
```{r Arima Residuals}
fits %>% dplyr::select(arima_search) %>% gg_tsresiduals(lag = 56) + labs(title = "Arima search", y="")
```
We can see that the Arima search model does good job of capturing the information
available in the data. The ACF plot of the Arima search model resembles that of
white noise, and the distribution of the residuals is close to a normal distribution.
```{r Prophet Residuals}
fits %>% dplyr::select(prophet) %>% gg_tsresiduals(lag = 56) + labs(title = "Prophet", y="")
```
The Prophet model above does not do as good a job of capturing the information
available in the data. There are still too many spikes in the ACF plot and the
distribution shows abnormalities. We can safely assume the Arima model will
perform better than the prophet model.
## RMSE
We can calculate the RMSE of the models against the test set to see which one
performs the best. If we were correct with the examination of the residual
plots above we can expect the arima model to perform much better than the
prophet model.
```{r RMSE}
Employment_forecasts %>%
accuracy(test) %>% dplyr::select(.model, RMSE) %>%
arrange(RMSE)
```
We were correct with our examination of the earlier plots. The Arima search
model does much better than the prophet model. The combination model has the
lowest RMSE making it the best approach. While this is a simple linear combination,
there is much work being done combining forecasting models using different methods,
and the combination models almost always outperform a singular model by itself.
Notably, the very popular prophet model has the highest RMSE making it even less
accurate than the seasonal naive with drift model.
## Winkler Score
Next up, I will generate 5000 future sample paths and their distributions in
order to match the distributions already stored in the construction of the
prophet model. This will allow me check for accuracy using a Winkler test.
Just like RMSE, the lower the Winkler score the better.
```{r Winkler Score, warning=FALSE}
Employment_futures <- fits %>% dplyr::select(c("ets", "stlf", "arima_search",
"seasonal_naive_drift",
"combination")) %>%
# Generate 5000 future sample paths
generate(h = "4 years", times = 5000) %>%
# Compute forecast distributions from future sample paths
as_tibble() %>%
group_by(Month, .model) %>%
summarise(dist = distributional::dist_sample(list(.sim))) %>%
ungroup() %>%
# Create fable object
as_fable(index = Month, key = .model,
distribution = dist, response = "Employed")
# Match the prophet 5000 future sample paths and join with Employment Futures
prophet_futures <- Employment_forecasts %>%
filter(.model=="prophet") %>%
dplyr::select(.model, Month, Employed) %>%
`colnames<-`(c(".model","Month","dist")) %>%
as_fable(index = Month, key = .model, distribution = dist,
response = "Employed")
Employment_futures <- Employment_futures %>%
full_join(prophet_futures,
by = join_by(Month,.model,dist))
# Winkler test
Employment_futures %>%
accuracy(test, measures = interval_accuracy_measures,
level = 95) %>%
arrange(winkler)
```
## Final Model
The combination model is the most accurate according to RMSE and and the Winkler
test. Let's visualize how the model performs with a simple plot.
```{r Final Model plot}
forecast(fits, h= "4 years") %>%
filter(.model=='combination') %>%
autoplot(train) +
autolayer(test, Employed, color = "red") +
labs(title = "US employment: Financial Activities",
y="Number of people (millions)",
x = "Year/Month")
```
## Summary
A couple of observations jump out right away from this investigative journey.
The final combination model is extremely accurate within the first year. This is
followed by the 2020 decline. No statistical model could have possibly foreseen
the Covid-19 pandemic, nor the economic challenges it wrought. If I was
responsible for producing short term forecasts of month to month changes during
2020, I would have been forced to use some form of scenario based forecasting
which would have included some generalized economic data. I would have also
considered assembling a panel of experts to help produce a Delphi forecast.
When we move past the first two years, the shaded prediction intervals in all
models are extremely wide. Therefore, it would make more sense to
forecast this particular series two years at a time instead of four. A well
trained long term forecast of 4 to 5 years, however, is still quite useful for
evaluating the realm of possibilities the future may bring.