forked from UBC-STAT/stat-545-guidebook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcm008.Rmd
192 lines (110 loc) · 5.72 KB
/
cm008.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# Intro to plotting with `ggplot2`, Part II
## Orientation
### Worksheet
You can find a worksheet template for today [here](https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/tutorials/cm008-exercise.Rmd).
### Announcements
From Assignment 3 onwards, whenever you produce an HTML file, you must link to a rendered version of the file. We'll cover this today.
### Today
- GitHub Pages (15 min)
- Continue with `ggplot2`: a tour of some important `geom`s (20 min)
- `ggplot2` exercises from the worksheet (40 min)
## Participation repository and GitHub Pages (15 min)
### GitHub Pages
You can turn your GitHub repository into a website, by enabling __GitHub Pages__ on that repo. This is useful for something as small as being able to display HTML files without getting a local copy of the repository, to something as big as making a full fledged website like the [stat545.stat.ubc.ca](https://stat545.stat.ubc.ca) website.
- If you make a repo called `yourusername.github.io`, and enable GitHub Pages on that repo, then the URL of the website will be `https://yourusername.github.io/`.
- If you enable GitHub pages on any other repo, the URL for that repo will be `https://yourusername.github.io/name_of_other_repo`.
Learn more with GitHub's [GitHub Pages](https://pages.github.com/) tutorial.
### Practice with HTML file linking
We'll practice linking to an HTML file for today's exercise, by following the instructions on the (new!) ["Viewing and Linking to HTML Files"](https://stat545.stat.ubc.ca/evaluation/assignments/#viewing-and-linking-to-html-files) on the assignments home page.
## A tour of some important `geom`s (20 min)
Here, we'll explore some common plot types, and how to produce them with `ggplot2`.
### Histograms: `geom_histogram()`
Useful for depicting the distribution of a continuous random variable. Partitions the number line into bins of certain width, counts the number of observations falling into each bin, and erects a bar of that height for each bin.
Required aesthetics:
- `x`: A numeric vector.
By default, a histogram plots the _count_ on the y-axis. If you want to use proportion, specify the `y = ..prop..` aesthetic.
You can change the smoothness of the plot via two arguments (your choice):
- `bins`: the number of bins/bars shown in the plot.
- `binwidth`: the with of the bins shown on the plot.
Example:
```{r}
ggplot(gapminder, aes(lifeExp)) +
geom_histogram(bins = 50)
```
### Density: `geom_density()`
Essentially, a "smooth" version of a histogram. Uses [kernels](https://en.wikipedia.org/wiki/Kernel_density_estimation) to produce the curve.
Required aesthetics:
- `x`: A numeric vector.
Good to know:
- `bw` argument controls the smoothness: Smaller = rougher.
Example:
```{r}
ggplot(gapminder, aes(lifeExp)) +
geom_density()
```
### Jitter plots: `geom_jitter()`
A scatterplot, but with minor random perturbations of each point. Useful for scatterplots where points are overlaying, or when one variable is categorical.
Required aesthetics:
- `x`: any vector
- `y`: any vector
Example:
```{r}
ggplot(gapminder, aes(continent, lifeExp)) +
geom_jitter()
```
### Box plots: `geom_boxplot()`
This geom makes a boxplot for a numeric variable in each of a category. Useful for visualizing probability distributions across different categories.
Required aesthetics:
- `x`: A factor (categorical variable)
- `y`: A numeric variable
Example:
```{r}
ggplot(gapminder, aes(continent, lifeExp)) +
geom_boxplot()
```
### Ridge plots: `ggridges::geom_density_ridges()`
A (superior?) alternative to the boxplot, the ridge plot (also known as the joy plot) places a kernel density for each group, instead of the box.
You'll need to install the `ggridges` package. You can do lots more with ridges -- check out [the ggridges intro vignette](https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html).
Required aesthetics (reversed from boxplots!)
- `x`: A numeric variable
- `y`: A factor (categorical variable)
Example:
```{r}
ggplot(gapminder, aes(lifeExp, continent)) +
ggridges::geom_density_ridges()
```
### Bar plots: `geom_bar()` or `geom_col()`
These geom's erect a bar over each category.
`geom_bar()` automatically determines the height of the bar according to the count of each category.
`geom_col()` requires a manual specification of the bar heights.
Required aesthetics:
- `x`: A categorical variable
- `y`: A numeric variable (only required for `geom_col()`!)
- To use proportion in `geom_bar()` instead of count, set `y = ..prop..`
Example: number of 4-, 6-, and 8- cylinder cars in the `mtcars` dataset:
```{r}
ggplot(mtcars, aes(cyl)) +
geom_bar()
```
### Line charts: `geom_line()`
A line plot connects points with straight lines, from left-to-right. Especially useful if time is on the x-axis.
Required aesthetics:
- `x`: a variable having some ordering to it.
- `y`: a numeric variable.
Although not required, the `group` aesthetic will come in handy here. This aesthetic produces a plot independently for each group, and overlays the results.
```{r}
tsibble::as_tsibble(co2) %>%
rename(yearmonth = index,
conc = value) %>%
mutate(month = lubridate::month(yearmonth, label = TRUE),
year = lubridate::year(yearmonth)) %>%
ggplot(aes(month, conc)) +
geom_line(aes(group = year), alpha = 0.5) +
ylab("CO2 Concentration")
```
### Path plots: `geom_path()`
Like `geom_line()`, except connects points in the order that they appear in the dataset.
## Activity: Fix the Plots (40 min)
Fill out the [worksheet](https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/tutorials/cm008-exercise.Rmd) together.
## Time remaining?
If so, let's make tibbles with `tibble()`, and make a list column while we're at it. Maybe even `nest()` and `unnest()`.