-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
149 lines (124 loc) · 5.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: "Hurricane data analyis"
author: "Achim Rumberger"
date: "27 Januar 2018"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(fig.path = "README_figs/README-")
```
## Analysis of Hurricane data from HURDAT package
I would like to express my thanks to the contributor of the HURDAT package ("https://rdrr.io/cran/HURDAT/"). Compared to the original data from https://www.nhc.noaa.gov/data/hurdat/ it got so much easier to work with the data.
```{r}
#libraries
library(tidyverse)
library(ggthemes)
library(ggmap)
library(htmlwidgets)
library(gridExtra)
library(HURDAT)
library(lubridate)
library(splines)
library(plotly)
source("classifiy_hurricianes.R")
```
## Read Inital and Save Data
The data span the time from 1851 to 2016.
We will have almost 80k observations with 24 variables.
It will take a while to read the inital data with get_hurdat function.
Therefore I will save the data for more easy access in the future.
If you run this code the first time please uncomment the lines with "get_hurdat" and "saveRDS".
```{r}
#data <- get_hurdat(basin = c("AL", "EP"))
#saveRDS(data, "data/hurricanesALEP")
data <- readRDS("data/hurricanesALEP")
```
##Manage the dates
```{r}
data$YEAR <- year(data$DateTime)
data$MONTH <- month(data$DateTime)
data$DAY <- day(data$DateTime)
summary(data %>% select(YEAR, MONTH, DAY, Wind, Pressure))
```
## First plot to get an overview of the data
```{r}
# first plot
df = data %>%
group_by(YEAR) %>%
summarise(Distinct_Storms = n_distinct(Key))
p = ggplot(df, aes(x = YEAR, y = Distinct_Storms)) + theme_stata()
p + geom_line(size = 1.1) +
ggtitle("Number of Storms Per Year") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_smooth(method='loess', se = FALSE, formula = y ~ x) +
ylab("Storms")
```
##Categorize storms
Create a new variable which classifies the hurricanes according to windspeed.
```{r}
data$CATEGORY <- sapply(data$Wind, classifyHurricane)
```
## Plot the storms by category
```{r fig.height=20, fig.width=12}
#by category
df = data %>%
filter(grepl("H", CATEGORY)) %>%
group_by(YEAR,CATEGORY) %>%
summarise(Distinct_Storms = n_distinct(Key))
df$CATEGORY = factor(df$CATEGORY)
p = ggplot(df, aes(x = YEAR, y = Distinct_Storms, col = CATEGORY)) + theme_stata()
p + geom_line(size = 1.1) +
scale_color_brewer(direction = -1, palette = "Dark2") +
ggtitle("Number of Storms Per Year By Category (H)") +
theme(plot.title = element_text(hjust = 0.5)) +
facet_wrap(~ CATEGORY, scales = "free_x", ncol = 1) +
geom_smooth(method = 'lm', se = FALSE, col = 'black') +
theme(axis.text.x = element_text(hjust = 1, angle=45), legend.position = 'none') +
ylab('Number of Storms') +
theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))
```
## Make a distinct dataset
I also will filter the data. A hurricane in its lifecycle will go through several stages, from tropical storm to a class H1 hurricane to H2 to H1, tropical storm. I want to count each storm only once with its highest ratings.
On my laptop this takes a while, so again, I will save the data for further analysis. This will show us, that in the time frame from 1851 to 2016 a total of 2900 storms have been recorded.
If you run this code the first time please uncomment the lines with "highestClassification4Storm" and "saveRDS".
```{r}
#distinctStormData <- highestClassification4Storm(data)
#saveRDS(distinctStormData, "data/hurricanesALEPAAA")
distinctStormData <- readRDS("data/hurricanesALEPAAA")
summary(distinctStormData)
```
## Plot all distinct storms
```{r}
df = distinctStormData %>%
group_by(YEAR) %>%
summarise(Distinct_Storms = n_distinct(Key))
p = ggplot(df, aes(x = YEAR, y = Distinct_Storms)) + theme_stata()
p + geom_line(size = 1.1) +
ggtitle("Number of distinct Storms Per Year") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_smooth(method = 'loess', se = FALSE, formula = y ~ x) +
ylab('Number of Storms') +
theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))
```
## Plot disitinct hurricanes by category
```{r fig.height=20, fig.width=12}
df = distinctStormData %>%
filter(grepl("H", CATEGORY)) %>%
group_by(YEAR,CATEGORY) %>%
summarise(Distinct_Storms = n_distinct(Key))
df$CATEGORY = factor(df$CATEGORY)
p = ggplot(df, aes(x = YEAR, y = Distinct_Storms, col = CATEGORY)) + theme_stata()
p + geom_line(size = 1.1) +
scale_color_brewer(direction = -1, palette = "Dark2") +
ggtitle("Number ofd disitinct Storms Per Year By Category (H)") +
theme(plot.title = element_text(hjust = 0.5)) +
facet_wrap(~CATEGORY, scales = "free_x", ncol = 1) +
geom_smooth(method = 'lm', se = FALSE, col = 'black') +
theme(axis.text.x = element_text(angle=45), legend.position = 'none') +
ylab('Number of Storms') +
theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))
```
## Discussion
- Noteworthy to me seem to be two observations:
- the marked increase in overall storm activity from the 1950s
- the increase in category 5 hurricanes. What was once a very event, from the 1990s onward the occurence of a major hurricane became a regular feature of the hurricane season.