forked from UBC-STAT/stat-545-guidebook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcm013.Rmd
198 lines (135 loc) · 8.58 KB
/
cm013.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# Effective Visualizations
Now that you know how to create graphics and visualizations in R, you are armed with powerful tools for scientific computing and analysis. With this power also comes great responsibility. Effective visualizations is an incredibly important aspect of scientific research and communication. There have been several books (see references) written about these principles. In class today we will be going through several case-studies trying to develop some expertise into making effective visualizations.
## Worksheet
**The worksheet questions for today are embedded into the class notes.**
You can download this Rmd file [here](https://github.com/STAT545-UBC/Classroom/raw/master/cm013.Rmd)
Note, there will be very little coding in-class today, but I've given you plenty of exercises in the form of a supplemental worksheet (linked at the bottom of this page) to practice with after class is over.
## Resources
1. [Fundamentals of Data Visualization](https://serialmentor.com/dataviz/introduction.html) by Claus Wilke.
1. [Visualization Analysis and Design](https://www-taylorfrancis-com.ezproxy.library.ubc.ca/books/9780429088902) by Tamara Munzner.
1. [STAT545.com - Effective Graphics](https://stat545.com/effective-graphs.html) by Jenny Bryan.
1. [ggplot2 book](https://ggplot2-book.org) by Hadley Wickam.
1. [Callingbull.org](https://callingbull.org/tools.html) by Carl T. Bergstrom and Jevin West.
## Part 1: Warm-up and pre-test [20 mins]
### Warmup:
Write some notes here about what "effective visualizations" means to you. Think of elements of good graphics and plots that you have seen - what makes them good or bad? Write 3-5 points.
1.
1.
1.
1.
1.
### CQ01: Weekly hours for full-time employees
> Question: Evaluate the strength of the claim based on the data: "German workers are more motivated and work more hours than workers in other EU nations."
>
> Very strong, strong, weak, very week, do not know
- <<Your answer here (and make sure to explain why you chose this answer)>>
- Main takeaway: Summarize the main takeaway from this question/discussion here
### CQ02: Average Global Temperature by year
> Question: For the years this temperature data is displayed, is there an appreciable increase in temperature?
>
> Yes, No, Do not know
- <<Your answer here (and make sure to explain why you chose this answer)>>
- Main takeaway: Summarize the main takeaway from this question/discussion here
### CQ03: Gun deaths in Florida
> Question: Evaluate the strength of the claim based on the data: “Soon after this legislation was passed, gun deaths sharply declined."
>
> Very strong, strong, weak, very week, do not know
- <<Your answer here (and make sure to explain why you chose this answer)>>
- Main takeaway: Summarize the main takeaway from this question/discussion here
## Part 2: Extracting insight from visualizations [20 mins]
Great resource for selecting the right plot: https://www.data-to-viz.com/ ; encourage you all to consult it when choosing to visualize data.
## Part 3: Principles of effective visualizations [20 mins]
We will be filling these principles in together as a class (unfortunately we didn't get to do this in class, but here are the notes)
1. Apply [Principle of proportional ink](https://callingbullshit.org/tools/tools_proportional_ink.html)
- Definition: "The amount of ink used to indicate a value should be proportional to the value itself."
- Example: Truncating the y-axis on a bar chart to exaggerate the difference between bars violates the principle of proportional ink
1. Maintain a high data-to-ink ratio: [less is more](https://speakerdeck.com/cherdarchuk/remove-to-improve-the-data-ink-ratio)
- Definition: remove distracting visual elements to focus attention on the data
- Examples: Lighten line weights, remove backgrounds, never use 3D or special effects, remove unnecessary/redundant labels, etc...
1. Always update axes labels and titles on your plots
- In STAT545/547 we take principles of effective visualizations very seriously and you will lose marks if this isn't followed
1. Choose your scale-type carefully
- Whether you choose a linear, logarithm, sqrt scale depends on your data, context, and purpose
1. Choose your graph-type carefully
- Examples: [here](https://serialmentor.com/dataviz/directory-of-visualizations.html) is a great directory of plots
1. Choose colours with accessibility and readability in mind
- Examples: [here](http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#a-colorblind-friendly-palette) is a great set of colour schemes that are colour-blind friendly and perceptually uniform
### Make a great plot worse
Instructions: Below is a code chunk that shows an effective visualization. First, copy this code chunk into a new cell. Then, modify it to purposely make this chart "bad" by breaking the principles of effective visualization above. Your final chart still needs to run/compile and it should still produce a plot.
```{r, message = FALSE, warning = FALSE}
library("plotly")
library("tidyverse")
ggplot(airquality, aes(`Month`, `Temp`, group = `Month`)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.3) +
labs(x = "",
y = "",
title="Maximum temperature by month")+
theme_bw() +
scale_x_continuous(breaks=c(5,6,7,8,9),labels=c("May","June","July","August","September")) +
annotate("text", x = 4.08, y = 95,label="°F",size=8) +
coord_cartesian(xlim = c(4.5, 9.5),
clip = 'off')+
theme(panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "gray"),
panel.border = element_blank(),
text = element_text(size=18)
)
```
How many of the principles did you manage to break?
## Plotly demo [10 mins]
Did you know that you can make interactive graphs and plots in R using the plotly library? We will show you a demo of what plotly is and why it's useful, and then you can try converting a static ggplot graph into an interactive plotly graph.
This is a preview of what we'll be doing in STAT 547 - making dynamic and interactive dashboards using R!
*For this demo, make sure you have the following packages installed and loaded:*
```{r, message = FALSE, warning = FALSE}
library(tidyverse)
library(gapminder)
library(plotly)
```
### Make `ggplot2` graphs interactive
It's very easy to convert an existing ggplot2 graph into an interactive graph with `plotly::ggplotly`
On the below graph, explore the interactive options:
- *Hover* your cursor over individual points
- *Zoom* in and out by dragging across / using the zoom tool
- *Single-* and *double-click* items on the legend to isolate groups of points
- While zoomed-in, use the *pan* tool to "move" around the plot, google maps style!
```{r}
p <- gapminder %>%
ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point()
p %>%
ggplotly()
```
### Make interactive plots with `plotly::plot_ly`
We can also make interactive graphs using the the `plotly::plot_ly` function:
```{r}
p <- gapminder %>%
plot_ly(x = ~gdpPercap,
y = ~lifeExp,
color = ~continent,
# mode specifies the geometric object e.g. "markers" for points, "line" for lines
mode = 'markers',
# type controls the "type" of graph e.g. 'bar', 'scatter'
type = 'scatter'
)
p
```
### Share with others
To share with others:
1. Create a plotly account @ [plot.ly](plot.ly)
2. Navigate to settings, and take in the following information:
- your user name
- api key
Now, we will tell R your account information so that we can upload these plots to the web.
Note that once we run `api_create()`, the browser will open to a webpage displaying your interactive plot. You can share this page with others, but they will only be able to **view**. If you want others to be able to **edit** the graph, you can invite others to "*collaborate*" in the "*Sharing link*" option.
```{r eval = FALSE}
# fill in the below with your information
Sys.setenv("plotly_username"="your_plotly_username")
Sys.setenv("plotly_api_key"="your_api_key")
# upload our plots to the website
api_create(p, filename = 'name-of-your-plot')
```
## Supplemental worksheet (Optional)
You are highly encouraged to the cm013 supplemental exercises worksheet. It is a great guide that will take you through Scales, Colours, and Themes in ggplot. There is also a short guided activity showing you how to make a ggplot interactive using plotly.
- [Supplemental Rmd file here](https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/tutorials/cm013-supplemental.Rmd)