-
Notifications
You must be signed in to change notification settings - Fork 0
/
xtsTutorial.Rmd
181 lines (126 loc) · 7.62 KB
/
xtsTutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
title: "xts"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE,
message = FALSE,
cache = TRUE)
```
### Introduction
Xts stands for extensible time series and is an extension of zoo. Primarily, it makes working with time series data more consistent by incorporating various R objects used to represent time. By converting data to an xts object, we no longer need to worry about the class of object used to represent time (POSIXct, Date etc.). This avoids confusion as xts objects work consistently across a range of functions as we will see in the tutorial. It also provides many useful functions for manipulating and analyzing time series data and handling missing values.
### Structure of an XTS object
XTS objects can be viewed as simple R matrices with an index of corresponding date and time. It can also contain additional time based attributes like date created etc. (hence, extensible)
To understand this better, let us create an xts object `work` that stores the number of hours Bob worked along with some attributes about him like birth date. We can look at the structure of an xts object using the `str()` function.
```{r}
library(xts)
# Create the object data using 5 random numbers
hours <- rnorm(20, mean = 8)
# Create dates as a Date class object starting from 1922-01-01
dates <- seq(as.Date("1922-01-01"), length = 20, by = "days")
# Create birthday a POSIXct date class object
dob <- as.POSIXct("1900-01-08")
# Create xts object work
work <- xts(x = hours, order.by = dates, born = dob)
str(work)
```
Similarly, any xts has object has 2 main components:
* The core matrix part which contains the data
* The index part which contains the date and time information.
Many a time, we might have to retrieve these components from the xts object. This can be done by passing the xts object through `coredata` and `index` functions as shown below:
```{r}
# Extract the core data
hours = coredata(work)
# View the class of core data
class(hours)
```
```{r}
# Extract the index
index = index(work)
# View the class of index
class(index)
```
An important advantage of xts is its ability to incorporate date and time information from any of the R classes used to represent time like `Date`, `POSIXct` or some other class. This means that the order.by argument takes data sequence in any of the common R time objects and converts in into an internal representation.
### Subsetting
It is often required when working with time series data to filter observations over a certain time range like a week, day or month. Rows can be located in multiple ways - using ISO-8601 strings, other date objects, logicals, or integers.
ISO-8601 strings are especially helpful for this task as they allow complex ranges and intervals to be specified efficiently. ISO-8601 is an internationally recognized and accepted way to not only represent time and date but also ranges and repeating intervals. Let us look at some examples on how to subset an xts object `x` using `ISO-8601` strings:
Function | ISSO String
------------- | -------------
Year 2016 | x["2016"]
January 1, 2016 to March 22, 2016 | x["20160101/20160322"]
Up to and including January 2016 | x["/201601"]
Jan 13th 2010 | x["2010-01-13"] or x["20130113"]
Recurring intraday intervals can also be specified using ISSO strings followed by `T`. For example, `rainfall["T08:00/T10:00"]` gives rainfall recorded between 8am and 10 am on all days.
We are also often interested in questions like profits over the last few weeks/ months. The `last` and `first` functions are invaluable in answering these questions.
```{r eval=FALSE}
last(work, n = '2 weeks')
first(work, n = '2 weeks')
```
n can be a numeric or a character string. Numeric indicates that the first/ last n observations are selected. And negative number indicates that all but first/last n observations are selected.
n can also be a character string of the format "n period.type". secs, seconds, mins, hours, days, weeks, months, quarters, and years are all valid period.types.
These are especially useful for complex queries of the type: Extract the first three days of the Bob's second week at work.
```{r}
first(last(first(work, "2 weeks"), "1 weeks"), "3 days")
```
### Working with multiple time series
A lot of caution needs to exercised when performing operations on multiple time series as each row in time series denotes observation at a point in time. One of the time series can be missing data for a certain time steps or frquency of the two time series can be different.
XTS overcomes this issue by first aligning the time series by *intersection* of the indexes before performing any observation.
```{r}
a <- work[1:5]
a
```
```{r}
b<- work[2]
b
```
```{r}
a+b
```
However, we might sometimes want to merge different time series while retaining data points in both datasets rather than just the intersection.
merge() allows arbitrary number of objects to be combined by specifying
1. The type of join that needs to be performed
2. The strategy to be followed to fill in missing values.
To illustrate this, let us again add series a and b, but this time we assume the missing observations in b to be 0.
```{r}
df <- merge(a, b, join = "left", fill = 0)
df$a +df$b
```
### Handling missing value
One of the common ways to impute missing values, especially in high frequency time series data, is to replace the missing value with the last observed value. This can be done using na.locf(). This is called the *l*ast *o*bservation *c*arried *f*orward *a*pproach. We can aslo set the missing values to the next most recent observed value by setting the fromLast argument to TRUE.
```{r}
work[3] <- NA
work[1:5]
```
```{r}
na.locf(work[1:5])
```
```{r}
na.locf(work[1:5], fromLast=TRUE)
```
Another staretgy that can be useful depending on the problem, is impute the missing values based on simple linear (*in time!*) interpolation betwen points.This can be done using the `na.approx()` function.
### Other common operations
#### Lag
Another common before modelling or analyzing the properties of time series is to look at lags of time series. This can be done using the lag() function. A positive value of k will shift values k steps ahead in time and negative value will shift it k steps back in time.
```{r}
x <- work[5:10]
cbind(lag = lag(x, k = 1) ,original=x,lead=lag(x, k = -1))
```
Note that zoo follows opposite notation for k. Lag functions of zoo and xts also differ in how they handle NAs introduced by lag. xts keeps them whereas zoo drops them.
#### Differencing
Most timeseries in real world are non-stationary and need to be differenced and made stationary before modelling.
For example, a second order 2 day differenced time series can be obtained by:
```{r eval=FALSE}
diff(work, lag = 2, differences = 1)
```
### Conclusion
Through this tutorial, I introduced the key concepts and structure of xts package. We have also looked at subsetting the data efficiently, working with multiple time series without errors and common missing data imputation strategies for time series. However, there are wide range of useful tasks xts supports that we have not covered. Some of them are as follows:
1. Finding periodicity
2. Finding end points
3. Calculating rolling statistics
4. Changing frequency of the time series
I have attached resources in the references section for you to dwell deeper into any of these functions.
### References
https://cran.r-project.org/web/packages/xts/vignettes/xts.pdf
https://cran.r-project.org/web/packages/xts/xts.pdf
https://www.datacamp.com/courses/manipulating-time-series-data-in-r-with-xts-zoo
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf