-
Notifications
You must be signed in to change notification settings - Fork 63
/
Copy pathEnsembles_Smarket.Rmd
91 lines (74 loc) · 2.61 KB
/
Ensembles_Smarket.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
title: "Illustrating Ensemble Models - Smarket Data"
output:
html_document:
toc: yes
toc_float: yes
code_folding: hide
---
This data set consists of percentage returns for the S&P 500 stock index over 1,250 days from the beginning of 2001 until the end of 2005. For each date, we have recorded the percentage returns for each of the five previous trading days, Lag1 through Lag5. We have also recorded Volume (the number of shares traded on the previous day, in billions), Today (the percentage return on the date in question) and Direction (whether the market was Up or Down on this date).
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, warning = FALSE, message = FALSE}
library(tidyverse)
library(MLmetrics)
```
```{r}
# Helper function to print the confusion matrix and other performance metrics of the models.
printPerformance = function(pred, actual, positive="yes") {
print(table(actual, pred))
print("")
print(sprintf("Accuracy: %.3f", Accuracy(y_true=actual, y_pred=pred)))
print(sprintf("Precision: %.3f", Precision(y_true=actual, y_pred=pred, positive=positive)))
print(sprintf("Recall: %.3f", Recall(y_true=actual, y_pred=pred, positive=positive)))
print(sprintf("F1 Score: %.3f", F1_Score(pred, actual, positive=positive)))
print(sprintf("Sensitivity: %.3f", Sensitivity(y_true=actual, y_pred=pred, positive=positive)))
print(sprintf("Specificity: %.3f", Specificity(y_true=actual, y_pred=pred, positive=positive)))
}
```
# Read in the data
```{r}
library(ISLR)
df <- Smarket %>%
dplyr::select(-Today)
str(df)
head(df)
summary(df)
```
# Splitting the data
```{r}
set.seed(123) # Set the seed to make it reproducible
train <- sample_frac(df, 0.8)
test <- setdiff(df, train)
actual = test$Direction
formula = Direction ~ .
positive = "Up"
```
# Decision Tree
```{r, warning = FALSE}
library(rpart)
library(rpart.plot) # For pretty trees
set.seed(123)
tree <- rpart(formula, method="class", data=train)
rpart.plot(tree, extra=2, type=2)
predicted = predict(tree, test, type="class")
printPerformance(predicted, actual, positive = positive)
```
# Random Forests
```{r, warning = FALSE}
library(randomForest)
set.seed(123)
rf = randomForest(formula, data=train, mtry=3, ntree=100, importance=TRUE)
rf.predicted = predict(rf, test, type="class")
printPerformance(rf.predicted, actual, positive = positive)
varImpPlot(rf)
```
# Boosting
```{r, warning = FALSE}
library(fastAdaboost)
set.seed(123)
boost = adaboost(formula, data=train, nIter=1000)
boost.predicted = predict(boost, newdata=test)
printPerformance(boost.predicted$class, actual, positive = positive)
```