Francisco Yira Albornoz March 2nd, 2019
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5.9000 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(modelr)
- Create one plot on the fuel economy data with customised
title
,subtitle
,caption
,x
,y
, andcolour
labels.
ggplot(mpg, aes(displ, hwy, color = as.factor(year))) +
geom_point() +
geom_smooth(se = FALSE) +
labs(
title = "In 2008 cars tend to be more efficient, controlling by engine size",
subtitle = "However, the magnitude of the difference is small",
caption = "Data from fueleconomy.gov",
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)",
colour = "Year"
)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
- The
geom_smooth()
is somewhat misleading because thehwy
for large engines is skewed upwards due to the inclusion of lightweight sports cars with big engines. Use your modelling tools to fit and display a better model.
model_mpg <- lm(hwy ~ class + displ, data = mpg)
mpg_pred <- mpg %>%
add_predictions(model = model_mpg, var = "pred")
ggplot(mpg, aes(displ, hwy, colour = class)) +
geom_point() +
geom_line(data = mpg_pred, aes(y = pred)) +
labs(
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)",
colour = "Car type"
)
- Take an exploratory graphic that you’ve created in the last month, and add informative titles to make it easier for others to understand.
starwars %>%
mutate(gender = replace_na(gender, "NA"),
gender = fct_lump(gender, n = 2)) %>%
ggplot(aes(gender, height)) +
geom_boxplot() +
labs(
title = "Males tend to be taller than females in the Star Wars universe",
subtitle = "However, there is more height dispersion in males than in other genders",
y = "Height (cm)"
)
## Warning: Removed 6 rows containing non-finite values (stat_boxplot).
- Use
geom_text()
with infinite positions to place text at the four corners of the plot.
Top-right:
label <- tibble(
displ = Inf,
hwy = Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label), data = label, vjust = "top", hjust = "right")
Top-left:
label <- tibble(
displ = -Inf,
hwy = Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label), data = label, vjust = "top", hjust = "left")
Bottom-left:
label <- tibble(
displ = -Inf,
hwy = -Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label), data = label, vjust = "bottom", hjust = "left")
Bottom-right:
label <- tibble(
displ = Inf,
hwy = -Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label), data = label, vjust = "bottom", hjust = "right")
- Read the documentation for
annotate()
. How can you use it to add a text label to a plot without having to create a tibble?
This function allows us to directly put an annotation in a plot by specifying the position coordinates in the function call, as numeric vectors. An example:
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
p + annotate("text", x = 4, y = 25, label = "Some text")
- How do labels with
geom_text()
interact with faceting? How can you add a label to a single facet? How can you put a different label in each facet? (Hint: think about the underlying data.)
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
label <- mpg %>%
summarise(
displ = max(displ),
hwy = max(hwy),
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_text(aes(label = model), data = best_in_class) +
geom_text(aes(label = label), data = label, vjust = "top", hjust = "right") +
facet_wrap(~ cyl)
Labels are associated with specific data points in a tibble/dataframe. If the tibble that contain the labels has a column with the variable used for faceting, then the labels will be displayed in the corresponding facet. Otherwise, the label will be repeated in all facets.
Therefore, to put a different label in each facet we need to create a tibble with a column that indicates in which facet should be displayed each label.
- What arguments to
geom_label()
control the appearance of the background box?
label.padding
to control the amount of padding around the label,
label.r
to control the radius of the rounded corners, and label.size
to control the size of the label border.
- What are the four arguments to
arrow()
? How do they work? Create a series of plots that demonstrate the most important options.
The arrow()
function creates an object that acts as input for the
arrow
argument in geom_segment()
. arrow()
has four arguments:
angle
to specify the aperture angle in the arrow head (in degrees).length
to specify the length of the arrow head.ends
to specify in which end of the line/segment should the arrow head appear.type
to specify if the arrow head should be an open or closed triangle.
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_segment(aes(x = 5.5, y = 35, xend = 6.15, yend = 27),
arrow = arrow())
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_segment(aes(x = 5.5, y = 35, xend = 6.15, yend = 27),
arrow = arrow(angle = 10, type = "closed"))
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_segment(aes(x = 5.5, y = 35, xend = 6.15, yend = 27),
arrow = arrow(angle = 45, type = "open", unit(0.15, "inches")))
- Why doesn’t the following code override the default scale?
df <- tibble(
x = rnorm(10000),
y = rnorm(10000)
)
ggplot(df, aes(x, y)) +
geom_hex() +
scale_colour_gradient(low = "white", high = "red") +
coord_fixed()
## Warning: Computation failed in `stat_binhex()`:
Because the aesthetic we want to change is fill
, not colour
. We can
override the default scale by using scale_fill_gradient
.
- What is the first argument to every scale? How does it compare to
labs()
?
name
is the first argument in every scale function. Its default value
is waiver()
which is a function that returns the name of the first
variable that was mapped to that aesthetic.
In comparison, the first argument in labs()
is a set of name-value
pairs used to rename the scales (where “name” should be an aesthetic
included in the plot).
-
Change the display of the presidential terms by:
-
Combining the two variants shown above.
-
Improving the display of the y axis.
-
Labelling each term with the name of the president.
-
Adding informative plot labels.
-
Placing breaks every 4 years (this is trickier than it seems!).
start_year_plot <- lubridate::year(min(presidential$start))
end_year_plot <- lubridate::year(max(presidential$start))
seq_years <- seq(start_year_plot, end_year_plot, by = 4)
fouryears <- lubridate::make_date(seq_years, 1, 1)
presidential_plot <- presidential %>%
mutate(id = 33 + row_number(),
label_period = str_c(name, " (", id, ")"))
ggplot(presidential_plot, aes(start, id, colour = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_colour_manual(name = "Party",
values = c(Republican = "red", Democratic = "blue")) +
scale_y_continuous(
name = NULL,
labels = presidential_plot$label_period,
breaks = presidential_plot$id,
minor_breaks = NULL
) +
scale_x_date(
NULL,
breaks = presidential_plot$start,
date_labels = "'%y",
minor_breaks = fouryears
) +
labs(title = "Terms of US Presidents",
subtitle = "Eisenhower (34) to Obama (44th)")
- Use
override.aes
to make the legend on the following plot easier to see.
ggplot(diamonds, aes(carat, price)) +
geom_point(aes(colour = cut), alpha = 1/20) +
guides(colour = guide_legend(override.aes = list(alpha = 1)))