-
Notifications
You must be signed in to change notification settings - Fork 0
/
BTC_analysis.Rmd
845 lines (588 loc) · 36.5 KB
/
BTC_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
---
title: |
| **Bitcoin volatility analysis**
|
author: |
| _Author_:
| Enrico Grimaldi, 1884443
|
| _Professor_:
| Luca Tardella
|
date: |
|
| La Sapienza University of Rome
| a.y. 2022/2023
|
| Final project for the course of
| _Statistical Methods for Data Science 2_
output:
pdf_document: default
html_document: default
fig_align: "center"
toc: no
number_sections: true
bibliography: bibliography.bib
biblio-style: "apalike"
fontsize: 12pt
header-includes:
- \usepackage{adjustbox}
- \usepackage{amsmath}
- \usepackage{float}
- \usepackage{titling}
- \pretitle{\begin{center}
\includegraphics[width=3in,height=3in]{imgs/Sap_logo.png}\LARGE\\}
- \posttitle{\end{center}}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, results = "hide")
```
```{r}
# import libraries
library(tidyverse)
library(gt)
library(summarytools)
library(gridExtra)
library(ggplot2)
library(corrplot)
library(reshape2)
library(mcmcplots)
library(caret)
library(ggpubr)
library(fitdistrplus)
library(viridis)
library(tseries)
library(nortsTest)
library(MCMCvis)
library(ggmcmc)
library(bridgesampling)
library(bbsBayes)
library(bayesplot)
library(rugarch)
library(coda)
# to avoid time wasting
load(".RData")
```
```{r eval=FALSE}
# import scripts and custom modules
source("scripts/functions.R")
source("scripts/model_selection.R")
source("jags_models/GARCH.R")
source("jags_models/ARCH.R")
source("jags_models/tstudentGARCH.R")
source("forecasting/GARCH_NAforecast.R")
```
\newpage
```{=latex}
\setcounter{tocdepth}{4}
\tableofcontents
```
\newpage
# 1. Introduction
Bitcoin is the progenitor of cryptocurrencies, and despite its enormous potential, it has shown a number of issues including the lack of a regulated market and high price volatility (like many of the cryptos).
Cryptocurrencies are indeed considered high-risk financial assets and it is virtually impossible to predict their value because it is tied (like so many other products) to so-called market sentiment, daily news and a myriad of uncontrollable factors. We will try to study this great variability through analysis of BTC volatility and note that there are no recurring patterns or trends.
This project addresses the study and construction of econometric models in Bayesian framework that can capture the volatility of bitcoin. In fact, we start with a qualitative analysis of the time series in question and the subsequent proposal of a set of ad hoc models. The various models will then be improved by adjustment of the MCMC-based sampling process (playing on the number of iterations, number of chains, etc.) and reformulation of prior beliefs.
The main goal is to build a useful tool in decision analysis/risk management and in general try to infer the main critical factors when investing on assets of this type.
\
![](imgs/bitcoin.jpeg){height=500}
\newpage
# 2. The data set
The [Yahoo Finance Bitcoin Historical Data](https://www.kaggle.com/datasets/arslanr369/bitcoin-price-2014-2023) from Kaggle, spanning from 2014 to 2023, capture the evolution of Bitcoin's price over a decade, offering an overview of the following features about Bitcoin value:
```{r results='asis'}
# give a look
head(subset(data, select = -Returns )) %>%gt() %>% tab_options(., table.width = 300)
```
Of the kaggle data set we are solely interested in one of the features: the adjusted closing price of bitcoin (in terms of BTC/USD value), from which we also derive the "LogREturns" feature.
## 2.1. Returns and prices
In the analysis of financial data, asset equity returns are typically the main variable of interest (rather than prices). There are at least two reasons for this:
1. Returns are easier to interpret
2. Returns have statistical properties which are easier to handle (e.g. stationarity)
Let $P_t$ be the price of an asset at period $t$ $(t = 1, .., T)$ the **simple return** is defined as the gross return generated by holding the asset for one period.
Simple returns are a natural way of measuring the variation of the
value of an asset, however, it is more common to work with **log returns**, defined as:
$$
\epsilon_t= log(P_t)-log(P_{t-1})
$$
It is a good habit to multiply returns by 100 to express them as a percentage. Some statistical packages are sensitive to the scale of the data. Since log differences can be small, sometimes this creates numerical difficulties.
Thus:
$$
\epsilon_t= 100 \cdot (log(P_t)-log(P_{t-1}) )
$$
Below is a graphical comparison of prices and returns: note some evidence of leverage as descending prices imply higher volatility of returns.
Volatility, we will therefore see, will be a key issue in choosing our models and will influence our inferential results.
\
```{r}
plot_returns <- ggplot(data, aes(x = Date)) +
geom_line(aes(y = Returns), color = "red") +
labs(title = "Returns Time Series",
x = NULL,
y = "Returns") +
theme_minimal()
plot_prices <- ggplot(data, aes(x = Date)) +
geom_line(aes(y = Adj.Close), color = "blue") +
labs(title = "Prices Time Series",
x = "Date",
y = "Price") +
theme_minimal()
# Combine plots using facet_wrap
ggarrange(plot_returns, plot_prices,
ncol = 1, nrow = 2)
```
## 2.2. Volatility and heteroscedasticity
The **conditional variance** of a time series refers to the variability or dispersion of the data points in the series given the past observations or information. It is a measure of how much the values of the time series fluctuate around their conditional mean, taking into account the historical values of the series. In other words, it quantifies the uncertainty or risk associated with future values of the time series given the available information.
Mathematically, if $Y_t$ represents the value of the time series at time $t$, the conditional variance of $Y_t$ given the past observations up to time $t-1$ can be expressed as:
$$
\sigma_t^2 = Var(Y_t|Y_{t-1},Y_{t-2},..., Y_1)
$$
Volatility, on the other hand, is a broader concept that generally refers to the degree of variation or dispersion of a financial asset's price (or return) over time. It measures the magnitude of price fluctuations and is often used to assess the risk or uncertainty associated with an investment. In the context of financial markets, volatility is often calculated using various statistical measures, such as standard deviation or variance, to quantify how much an asset's price tends to deviate from its average.
\
```{r, fig.show="hold", out.width="50%"}
# Plot ACF for Returns
acf.plot(data$LogReturns)
```
\
The inspection of the autcorrelograms suggests that:
1. returns appear to have weak or no serial dependence.
2. absolute (and so square) returns appear to have strong serial dependence
The first finding suggests that the choice of a possible model should fall on types such as **first-order Auto Regression (AR(1))** or derivatives given the high ACF at lag=0, and low ACF for the rest of the lags.
The second observation, on the other hand suggests that the scale of returns changes in time and the (conditional) variance of the process is time varying. In order to capture **volatility clustering**, we need to introduce appropriate time series processes able to model this behavior!
The low ACF values at subsequent lags might suggest that the process generating the data is highly dynamic and subject to rapid changes. The lack of correlation at longer lags could indicate that the influence of past values on the current value diminishes rapidly, possibly due to external factors or interventions.
We therefore infer that the volatility of the returns is directly symptomatic of a strong **heteroscedasticity** of the data.
## 2.3. Time series decomposition and stationarity
A given time series is thought to consist of three systematic components including level, trend, seasonality, and one non-systematic component called noise.
These components are defined as follows:
>- _Level_: The average value in the series.
>- _Trend_: The increasing or decreasing value in the series.
>- _Seasonality_: The repeating short-term cycle in the series.
>- _Noise (Random)_: The random variation in the series.
A series is thought to be an aggregate or combination of these four components, all series have a level and noise, while The trend and seasonality components are optional.
In our particular case we decide to consider a **multiplicative model** in which the timeseries is thought as a multiplicative combination of the above components:
$$
y(t) = Level \times Trend \times Seasonality \times Noise
$$
We report below the decomposition results for _prices_ and _returns_ considering that we have one observation per day (for 9 years).
\
```{r}
times <- ts(data$LogReturns, frequency = 365)
decomposed.ts <- decompose(times, type="multiplicative")
decomp.plot(decomposed.ts, "returns", "red")
```
```{r}
times <- ts(data$Adj.Close, frequency = 365)
decomposed.ts <- decompose(times, type="multiplicative")
decomp.plot(decomposed.ts, "prices", "blue")
```
### 2.3.1. Level
Simply observing the **Level** for both our _Price_ and _Returns_ variables we can propose some obvious findings:
1. our choice of a multiplicative model is justified by obvious nonlinear changes in the values of both variables, i.e. the changes increase and decrease over time;
2. they do not seem to have obvious trends and seasonality;
3. we can guess perhaps some **heteroscedasticity** in the data given large fluctuations in values for adjacent data.
In the following sections we are going to dive into the analysis of the last 2 reported points.
### 2.3.2. Trend and seasonality
The **Trend** represents the long-term change in the level of a time series. This change can be either upward (increase in level) or downward (decrease in level). If the change is systematic in one direction, then the trend is monotonic.
_ARIMA-type models_ [@ARIMA] for instance are suitable in case of data with an obvious trend, but cannot handle seasonality.
**Seasonality** refers to periodic fluctuations in certain business areas and cycles that occur regularly based on a particular season. A season may refer to a calendar season such as summer or winter, or it may refer to a commercial season such as the holiday season.
An extension of ARIMA models to handle seasonality is the _SARIMA model_.
In this case, however, we find neither trend nor seasonality
### 2.3.3 Stationarity
At first glance (with the simple visualization of the observed levels) and based on the previously obtained results we can assume the **stationarity** of our time series. A stationary time series is one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times. In general, a stationary time series will have no predictable patterns in the long-term.
The **Dickey–Fuller test** [@stationarity] tests the null hypothesis that a unit root is present in an autoregressive (AR) time series model. The alternative hypothesis is different depending on which version of the test is used, but is usually stationarity or trend-stationarity.
Given that A simple **AR model** is given by:
$$
y_t = \rho y_{t-1} + u_t
$$
where:
>- $y_t$ is the variable of interest (returns) at time $t$;
>- $\rho$ is a coefficient;
>- $u_t$ is an error term.
Write the regression model as:
$$
\Delta y_t = (\rho -1) y_{t-1} + u_t = \delta y_{t-1} + u_t
$$
where $\Delta$ is the first difference operator and $\delta = \rho -1$. This model can be estimated, and testing for a unit root is equivalent to testing $\delta = 0$.
The **augmented Dickey–Fuller (ADF) statistic** (augmented since it removes all the structural - autocorrelation - effects in the time series), used in the test, is a negative number. The more negative it is, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence.
\newpage
```{r results='markup'}
adf.test(data$LogReturns[-1])
```
We got evidence against the the null hypothesis, thus we simply verified the stationarity of our data!
This aspect is going to be a central point since the models we are going to use in order to well represent volatility often imply stationarity of the time series.
## 2.4. ARCH effect
We veryfied in point _2.2._ that the time series exhibits conditional heteroscedasticity through the study of the autocorrelation in the absolute/squared series. Now we formally check for the **autoregressive conditional heteroscedastic (ARCH) effects**.
The **Lagrange multiplier (LM) test** [@LMtest] for ARCH is widely used as a specification test in univariate time series models. It is a test of no conditional heteroskedasticity against an ARCH model. The test is easy to compute from an auxiliary regression involving the squared least squares (LS) residuals. The LM statistic is asymptotically distributed as $\chi^2$ under the null hypothesis.
```{r, results='markup'}
Lm.test(data$LogReturns[-1])
```
With this result we can claim with a large evidence that our time series is NOT homoscedastic!
\newpage
# 3. Models
Resuming the above considerations, our model will have to adhere to a number of features to enable consistent results to be obtained:
1. It need not be robust to modeling any trends, patterns or forms of seasonality
2. Instead, it must provide for strong heteroscedasticity.
3. It can be based on the assumption of stationarity of the data.
Below we will report three models typically used for time series analysis similar to the one under consideration.
## 3.1. The ARCH model
The **ARCH (Autoregressive Conditional Heteroskedasticity)** model [@ARCH] is a statistical time series model commonly used in econometrics and finance to capture volatility clustering in financial data. It was introduced by Robert F. Engle in the early 1980s as a way to model the changing volatility observed in financial returns over time. The ARCH model is particularly useful for analyzing financial time series data where the volatility, or the variation in the magnitude of returns, is not constant and can exhibit patterns of clustering or persistence.
The **ARCH(1)** model is given by:
$$
\begin{aligned}
y_t &= \sqrt{\sigma^2_t} \cdot z_t \nonumber \\
\sigma^2_t &= \omega + \alpha y^2_{t-1}
\end{aligned}
$$
where $w > 0, \alpha \geq 0$ in order to have
stationarity.
This variables and parameters have a specific meaning in our model:
>- $y_t$ is the observed value at time t
>- $z_t$ is the white noise (innovation) term at time t
>- $\sigma_t^2$ is the conditional variance of $y_t$
>- $\omega$ is the baseline volatility
>- $\alpha$ represents the impact of past squared residuals on the conditional variance
We then define our _bayesian ARCH model_ selecting the correct set of distributions for the above parameters and variable. We use the following setup:
$$
\begin{aligned}
z_t &\sim N(\mu, 1) \rightarrow y_t= \sigma_t z_t \sim N(\mu, \sigma_t^2) \nonumber \\
\mu &\sim N(0, 100^2) \nonumber \\
\omega &\sim U(0, 10) \nonumber \\
\alpha &\sim U(0, 1)
\end{aligned}
$$
It follows the above model coded with JAGS [@Plummer2017].
```{r echo=TRUE}
arch_model_code <- "
model
{
# Likelihood
for (t in 1:N) {
y[t] ~ dnorm(mu, tau[t])
tau[t] <- 1/pow(sigma[t], 2)
}
sigma[1] ~ dunif(0,10)
for(t in 2:N) {
sigma[t] <- sqrt(omega + alpha * pow(y[t-1] - mu, 2))
}
# Priors
mu ~ dnorm(0.0, 0.01)
omega ~ dunif(0, 10)
alpha ~ dunif(0, 1)
}
"
```
## 3.2. The Gaussian GARCH model
The **GARCH (Generalized Autoregressive Conditional Heteroskedasticity)** model [@GARCH] is an extension of the ARCH (Autoregressive Conditional Heteroskedasticity) model that further captures and models the time-varying volatility in financial and economic time series data. Introduced by Tim Bollerslev in the mid-1980s, the GARCH model addresses some of the limitations of the basic ARCH model by incorporating past values of the conditional variance itself into the volatility modeling process.
Mathematically, a **GARCH(1,1)** model is given by the following structure:
$$
\begin{aligned}
y_t &= \sqrt{\sigma^2_t} \cdot z_t \nonumber \\
\sigma^2_t &= \omega + \alpha y^2_{t-1} + \beta \sigma_{t-1}^2
\end{aligned}
$$
As can be seen, the only difference from the previous model is the dependence of volatility on past volatility values (conditional variance) and the introduction of a new $\beta$ parameter to govern this relationship.
GARCH and ARCH are both models used to analyze and forecast volatility in financial time series data, such as the volatility of Bitcoin prices. The preference for GARCH over ARCH for modeling Bitcoin volatility is based on several factors:
1. Flexibility and improved modeling: GARCH is an extension of the ARCH model that allows for more complex and flexible modeling of volatility dynamics. GARCH models incorporate both lagged conditional variances (as in ARCH) and lagged conditional variances of the squared past returns. This added flexibility often helps capture more intricate volatility patterns observed in financial data like Bitcoin prices.
2. Better fit to Real Data: cryptocurrencies like Bitcoin are known for their unique volatility characteristics, including periods of extreme volatility followed by relative stability. GARCH models with their ability to capture changing volatility patterns over time are often better suited to capture these fluctuations and trends in the data.
3. Accommodation of **volatility clustering**: volatility clustering refers to the phenomenon where periods of high volatility tend to cluster together over time. GARCH models can capture this clustering effect by allowing for the persistence of volatility shocks, making them more suitable for assets like Bitcoin that often exhibit this behavior.
4. More sophisticated volatility forecasting: GARCH models can generate volatility forecasts that are more accurate and reliable compared to ARCH models. This is crucial for risk management and derivative pricing, where accurate volatility forecasts are essential.
5. Statistical significance and model selection: GARCH models often provide more accurate parameter estimates and better model fit, as determined by statistical tests and criteria. This helps in selecting a more appropriate and reliable model for analyzing Bitcoin volatility.
We then define our _bayesian GARCH model_ selecting the correct set of distributions for the above parameters and variable. We use the following setup:
$$
\begin{aligned}
z_t &\sim N(\mu, 1) \rightarrow y_t= \sigma_t z_t \sim N(\mu, \sigma_t^2) \nonumber \\
\mu &\sim N(0, 100^2) \nonumber \\
\omega &\sim U(0, 10) \nonumber \\
\alpha &\sim U(0, 1) \nonumber \\
\beta &\sim U(0, 1)
\end{aligned}
$$
It follows the above model coded with JAGS.
```{r echo=TRUE}
garch_model_code <- "
model
{
# Likelihood
for (t in 1:N) {
y[t] ~ dnorm(mu, tau[t])
tau[t] <- 1/pow(sigma[t], 2)
}
sigma[1] ~ dunif(0,10)
for(t in 2:N) {
sigma[t] <- sqrt(omega + alpha * pow(y[t-1] - mu, 2) +
beta * pow(sigma[t-1], 2))
}
# Priors
mu ~ dnorm(0.0, 0.01)
omega ~ dunif(0, 10)
alpha ~ dunif(0, 1)
beta ~ dunif(0, 1)
}
"
```
## 3.3. The Student's $t$ GARCH model
We can then begin to analyze the distribution of our data. Let's start with a simple visualization of the distribution (histogram) of our data and then try to fit some possible density functions by selecting a few by intuition and using MLE for parameters.
\
![](imgs/fit_distros.png){height="600" width="600" align="center}
We note that fortunately for log-returns a normal distribution seems quite consistent for a possible prior assumption in Bayesian framework.
However, we are not satisfied with a simple visualization of this kind and continue the investigation with a **Kolmogorov-Smirnov test**, a _goodnes of fit_ test. The resulting p-value indicates that the population does not actually exhibit normal distribution, rejecting the null hypothesis.
\newpage
```{r results='markup'}
ks.test(data$LogReturns[-1], "pnorm", mean = mean(data$LogReturns[-1]),
sd = sd(data$Returns[-1]))
```
We then disprove our initial assumption of a normal prior and via a _QQ plot_ we highlight the obvious differences between the current data and the normal distribution that we were previously unable to capture.
\
```{r}
ggplot(data, aes(sample=LogReturns)) +
stat_qq(color="purple") +
stat_qq_line() +
labs(title = "QQ Plot for returns")
```
Traditionally, the errors (and so the returns) have been assumed to be Gaussian, however, it
has been widely acknowledged that financial returns display fat tails and are not conditionally
Gaussian. Gaussian GARCH models cannot quite capture the key properties of asset returns and to address this, researchers [@Bollerslev; @Changli; @Bai] have explored alternative distributions to model asset returns.
One approach is to assume that the returns are IID and follow a Student’s t-distribution (t-Student) [@tstud] instead of a normal distribution. The t-Student distribution allows for fatter tails, meaning it assigns higher probabilities to extreme events compared to the normal distribution.
\newpage
We use a GARCH model with a **non-central t-student** prior distribution for returns:
$$
\begin{aligned}
y_t = \sigma_t z_t &\sim t(\mu, \sigma_t^2, \nu) \nonumber \\
\mu &\sim N(0, 100^2) \nonumber \\
\omega &\sim U(0, 10) \nonumber \\
\alpha &\sim U(0, 1) \nonumber \\
\beta &\sim U(0, 1)
\end{aligned}
$$
where:
>- $\delta = \mu \cdot \sigma^{- \frac{1}{2}}_t$ is the non-centrality parameter
>- $\nu = 8$ represents the degrees of freedom and we set it equal to a constant according to a study provided in ["Notes on the econometrics of asset allocation and risk measurement"](https://didattica.unibocconi.it/mypage/index.php?IdUte=48917&idr=3008) (Favero)
\
```{r}
ggplot(data, aes(sample = LogReturns)) +
stat_qq(distribution = qt, dparams = list(df = 8, ncp=0.01), color = "purple") +
stat_qq_line(distribution = qt, dparams = list(df = 8, ncp=0.01)) +
labs(title = "QQ Plot for Student's t-distribution")
```
\newpage
Here the code:
```{r echo=TRUE}
tstud_model_code <- "
model
{
# Likelihood
for (t in 1:N) {
y[t] ~ dt(mu, tau[t], 8)
tau[t] <- 1/pow(sigma[t], 2)
}
sigma[1] ~ dunif(0,10)
for(t in 2:N) {
sigma[t] <- sqrt(omega + alpha * pow(y[t-1] - mu, 2) +
beta * pow(sigma[t-1], 2))
}
# Priors
mu ~ dnorm(0.0, 0.01)
omega ~ dunif(0, 10)
alpha ~ dunif(0, 1)
beta ~ dunif(0, 1)
}
"
```
\newpage
# 4. Model comparison
## 4.1. Simple inferential findings and results
Here we report our main inferential findings for each model:
```{r results='asis'}
data.frame(Parameter = setdiff(arch_model_parameters, "sigma")) %>% mutate(arch_res) %>% gt() %>% tab_options(., table.width = 600) %>%
tab_header(title = "ARCH(1) results")
```
```{r fig.height = 6}
MCMCtrace(arch_model,
params = c('omega', 'alpha', 'mu'),
ISB = FALSE,
exact = TRUE,
iter = 100,
ind = TRUE,
pdf = FALSE,
Rhat = TRUE,
n.eff = TRUE,
type = "density")
```
\newpage
```{r results='markup'}
data.frame(Parameter = setdiff(garch_model_parameters, "sigma")) %>% mutate(garch_res) %>% gt() %>% tab_options(., table.width = 600) %>%
tab_header(title = "GARCH(1,1) results")
```
```{r, fig.height = 7}
MCMCtrace(garch_model,
params = c('omega', 'alpha', 'beta', 'mu'),
ISB = FALSE,
exact = TRUE,
iter = 100,
ind = TRUE,
pdf = FALSE,
Rhat = TRUE,
n.eff = TRUE,
type = "density")
```
```{r results='markup'}
data.frame(Parameter = setdiff(tstud_model_parameters, "sigma")) %>% mutate(tgarch_res) %>% gt() %>% tab_options(., table.width = 600) %>%
tab_header(title = "Student t GARCH(1,1) results")
```
```{r fig.pos='H', fig.height = 7}
MCMCtrace(tstud_garch_model,
params = c('omega', 'alpha', 'beta', 'mu'),
ISB = FALSE,
exact = TRUE,
iter = 100,
ind = TRUE,
pdf = FALSE,
Rhat = TRUE,
n.eff = TRUE,
type = "density")
```
Let's visualize the main estimate differences between the two GARCH models in order to better understand how much the prior choice can influence the final inferencial results.
```{r}
MCMCplot(object = tstud_garch_model,
object2 = garch_model,
params = setdiff(tstud_model_parameters, "sigma"),
offset = 0.1,
HPD=TRUE,
main = "MCMC Plot of t-student GARCH and GARCH Models")
legend("bottomright", legend = c("t-student GARCH", "GARCH"),
col = c("black", "red"), lty = 1, lwd=2, border = NA)
grid()
box(bty = "n")
```
We note that the parameter estimates do not differ much in the two models, but in general the Gaussian model has greater varaibility in the estimates and a $\omega$ value that differs greatly. The latter difference is mainly due to the presence or absence of centrality in the definition of the prior and $\omega$, as an intercept, is strongly influenced by it.
## 4.2. Deviance Information Criterion (DIC)
The **Deviance Information Criterion - DIC** [@DIC] was introduced as a measure of model comparision and adequancy. For the model $m$ and its paramteres $\theta_m$ it is given by:
$$
DIC(m) = 2 \overline{D(\theta_m,m)} - D( \overline{\theta_m}, m) = D(\overline{\theta_m}, m) + 2p_D
$$
where:
>- $D(\theta_m, m)=-2log[f(y|\theta_m,m)]$ is the usual deviance measure (twice the log-likelihood)
>- $\overline{D(\theta_m,m)}$ is the posterior mean of the preceding deviance
>- $p_D = \overline{D(\theta_m,m)} - D( \overline{\theta_m}, m)$ is the number of effective parameters
>- $\overline{\theta}_m$ is the posterior mean of the paramteres.
Note that the DIC should be used with caution since it assumes that the posterior mean can be used as a good summary of central location for the description of the posterior distribution (Ntzoufras, 2010).
Also note that, the larger the effective number of parameters ($p_D$) is, the easier it is for the model to fit the data, and so the deviance needs to be penalize! In general we can see that for more complex models we have an higher $p_D$ but the DIC in any case overperforms the simplest one.
```{r results='markup'}
model_comparison %>%gt() %>% tab_options(., table.width = 500) %>% tab_header(title = "Model Comparison")
```
The results mirror the findings reported in the model exposure phase: the Student's t GARCH model performs best.
The however high values of DIC also reflect the substantial difference that even the Student's t distribution has with respect to the true distribution of the data.
## 4.3. Effective sample size (ESS)
The Effective Sample Size (ESS) in the context of MCMC, measures the information content, or effectiveness of a sample chain. For example, 1,000 samples with an ESS of 200 have a higher information content than 2,000 samples with an ESS of 100. The ESS can be defined as the minimum size of a set of posterior samples (taken directly from the posterior), which have the same efficiency (measure of quality) in the posterior density estimation as a given chain of samples obtained from MCMC sampling (Martino et al., 2017). Conversely, ESS theory can quantify how many samples should be taken in a chain to reach a given quality of posterior estimates.
```{r results='markup'}
es %>% gt() %>% tab_options(., table.width = 500) %>% tab_header(title = "Effective Sample Size Comparison")
```
If ESS is much small, it suggests that the samples are highly correlated and may not be providing as much independent information as desired. This could indicate poor convergence.
In general, a larger ESS makes one model preferable to another because it demonstrates better convergence. In fact, thanks to the reported study, we were able to inspect the models before an actual convergence analysis and found a difficulty in choosing the Student's t GARCH model: the model had a smaller ESS, demonstrating less rapid convergence and not guaranteeing some sort of temporal independence when sampling.
We remedied the problem by doubling the number of iterations of the last model, so that its convergence performance could be comparable to that of the other proposed models.
\newpage
# 5. Convergence analysis
The convergence analysis refers to the study of whether the MCMC sampling algorithm (in our case Gibbs sampler) has reached its equilibrium (target) distribution.
\
```{r fig.height=7.4}
grid.arrange(trace1, trace2, trace3, trace4, ncol = 1, nrow = 4)
```
As explained in the Peter Hoffman book [@Hoff2009] We check for the absence of any pattern or trend in the traceplot to exclude the non-convergence of some parameter by visual analysis.
Then we monitor the autocorrelations and verify the convergence speed of the MCMC.
\
```{r results="markup"}
# Extract the samples from the model
samp <- tstud_garch_model$BUGSoutput$sims.matrix
# Plot the ACF for each parameter
par(mfrow=c(2,2)) # Set up a 2x2 grid for the plots
for (param in tstud_model_parameters[- 5]) {
acf_values <- acf(samp[,param], lag.max = 30, plot = FALSE)
plot(acf_values, main = paste("ACF for", param))
}
```
\newpage
# 6. Volatility forecasting
Let us now turn to volatility forcasting, the main purpose of the model used (GARCH). We note that in predicting the level of volatility both GARCH models are quite accurate and are able to predict the conditional variance with some correctness.
Therefore, the models are quite reliable and allow to understand and predict possible moments of high and low volatility that can largely impact the price (the value) of Bitcoin.
```{r results='markup'}
volatility.plot(tstud_garch_model, garch_model, y)
```
\
The two models look very similar but looking closely at the plots one can perceive a thinner line for the Student's t prior setup. Probably the Gaussian model detects more oscillation (more noise) and is less able to predict extreme values.
\newpage
# 7. Missing values forecasting
We reported a small experiment in which we insert 20 NA values in the last rows of our data frame, leaving it to the MCMC and the selected Bayesian model to correctly replace the NA values with predicted values.
In this way we try to use a model generally used for volatility forecasting to predict future values of returns.
Collecting data for the last 20 days regarding the closing price of Bitcoin from Yahooo Finance we compare the results obtained with the actual results (obviously finding no small difference).
```{r eval=FALSE}
new_data <- read.csv("BTC-USD2.csv")
df <- na.forecast(fore_garch_model, new_data, 20)
```
\
```{r}
true_plot <- ggplot(df, aes(x = Date)) +
geom_line(aes(y = true), color = "chocolate", size=1) +
labs(title = "True returns",
x = NULL,
y = "Returns") +
theme_minimal()
pred_plot <- ggplot(df, aes(x = Date)) +
geom_line(aes(y = pred), color = "slateblue", size=1) +
labs(title = "Predicted returns",
x = "Date",
y = "Returns") +
theme_minimal()
# Combine plots using facet_wrap
ggarrange(true_plot, pred_plot,
ncol = 1, nrow = 2)
```
\
Clearly, the model does not perform well in direct forecasting of returns, but it does manage to forecast the general trend with some success because of correct volatility forecasting.
Our GARCH model is therefore able to determine when price fluctuations might occur.
\newpage
# 8. Comparative analysis with frequentist approach
As for the inference and prediction, the Bayesian approach is especially well-suited for GARCH models [@freqvsbayes] and provides some advantages compared to classical estimation techniques, as outlined by Ardia & Hoogerheide (2010).
1. The positivity constraints on the parameters to ensure positive variance, may encumber some optimization procedures. In the Bayesian setting, constraints on the model parameters can be incorporated via priors.
2. In most of the cases we are more interested not in the model parameters directly,
but in some non-linear functions of them. In the maximum likelihood (ML) setting, it is quite troublesome to perform inference on such quantities, while in the Bayesian setting it is usually straightforward to obtain the posterior distribution of any non-linear function of the model parameters.
4. Bayesian estimation provides reliable results even for finite
samples.
5. Hall & Yao (2003) add that the ML approach presents some limitations when the errors are heavy tailed (as in our case), also the convergence rate is slow and the estimators may not be asymptotically Gaussian.
```{r results='markup'}
# Specify the model
mod_specify= ugarchspec(mean.model=list(armaOrder=c(0,0)),variance.model=list(model='sGARCH',garchOrder=c(1,1)),distribution.model = 'std')
fit=ugarchfit(data=data$LogReturns[-1], spec = mod_specify, out_sample=20)
# params
Estimate <- fit@fit$coef[c("mu","omega","alpha1","beta1")]
Std <- fit@fit$se.coef[-5]
data.frame(Parameter = setdiff(garch_model_parameters, "sigma")) %>% mutate(as.data.frame(Estimate)) %>% mutate(as.data.frame(Std)) %>% gt() %>% tab_options(., table.width = 300) %>%
tab_header(title = "Frequentist framework results")
```
We can see that the frequentist approach allows obtaining results consistent with those reported above for Bayesian models, but differs greatly in estimating the $\mu$ centrality parameter and the parameter affecting the time dependence of past conditional variance $\sigma$. It is very interesting how the standard errors for defining confidence intervals for parameter estimates are larger in the frequentist context than in the Bayesian context, demonstrating a theoretical property of an interval with $(1- \alpha)$ % Bayesian coverage:
$$
\begin{aligned}
Pr(l(Y) < \theta < u(Y) | \theta) &= (1-\alpha) + \epsilon_n \nonumber \\
\epsilon_n &= |a|/n
\end{aligned}
$$
So an interval with $(1- \alpha)$ % Bayesian coverage will asymptotically have a $(1- \alpha)$ % frequentist coverage (same results if under "infinite information")
\newpage
# 9. Further improvements
we propose additional improvements that could inspire far more in-depth studies than those considered for this specific university project. Such improvements see the use of more complicated models and would likely require more complex MCMC-based sampling software than JAGS.
## 9.1. MS-GARCH
The idea is to estimate a model that permits regime switching in the parameters as suggested by @ARDIA2019266. We call it a **Markov-switching GARCH** (MS-GARCH) model [@msgarch] . It is a generalization
of the GARCH model and permits a different persistence in the conditional variance of each
regime. Thus, the conditional variance in each regime accommodates volatility clustering,
nesting the GARCH model as special case.
Let $\{ s_t \}$ be an __ergodic Markov Chain__ of a finite set of states $S = \{ 1, ..., n \}$, with transition probabilities $\nu_{ij} = P(s_t=i|s_{t-1} = j)$ and invariant probability measure $\pi_i$, The MS-GARCH is given by:
$$
\begin{aligned}
y_t &= \mu_{s_t} + \sqrt{\sigma^2_t} \cdot z_t \nonumber \\
\sigma^2_t &= \omega_{s_t} + \alpha_{s_t} \epsilon^2_{t-1} + \beta_{s_t} \sigma^2_{t-1}
\end{aligned}
$$
where $\omega_{s_t} > 0$, $\alpha_{s_t} \geq 0$, $\beta_{s_t} \geq 0$ for $s_t \in \{ 1, 2, .., n \}$ and $\epsilon_t = y_t - \mu_{s_t}$
It has been studied in the articles reported above that a model based on two stationary regimes achieved the best performance for modeling the volatility of Bitcoin.
## 9.2. Multivariate GARCH
**Volume** is the amount of an asset or security that changes hands over some period of time, often over the course of a trading day. For instance, a stock's trading volume refers to the number of shares traded between its daily open and close.
Many studies have confirmed how much market volume can be related (correlated) to fluctuations in product prices or stock values.
Thus by including more information such as volume itself, any multivariate declination of GARCH would likely be able to achieve better volatility predictability performance [@volumes].
\newpage
# References