-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chapter_13.Rmd
69 lines (46 loc) · 2.08 KB
/
Chapter_13.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: "Chapter 13"
author: "Laura"
date: "12/10/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
library(tidyverse); library(skimr); library(nycflights13); library(GGally); library(ggstance); library(lvplot); library(hexbin); library(modelr); library(magrittr); library(maps)
```
## Notes for Chapter 13 Relational data
### 13.4 Mutating joins
```{r ch1341}
flights %>%
count(carrier)
flights
airlines
flights2 <- flights %>%
left_join(airlines, by = "carrier")
```
#### 13.4.6 Exercises
```{r ex13461}
flights2 <- flights %>%
group_by(dest) %>%
summarise(meandel = mean(arr_delay, rm.na = TRUE)) %>%
ungroup()
# this doesnt work
airports %>%
semi_join(flights, c("faa" = "dest")) %>%
ggplot(aes(lon, lat)) +
borders("state") +
geom_point() +
coord_quickmap() +
theme_minimal()
```
### 13.5 Filtering joins
As indicated by its name, this is an alternative to `filter()` when the information you want to use to filter you current table is in a different table.
Filtering joins match observations in the same way as mutating joins, but affect the observations, not the variables. There are two types:
* semi_join(x, y) keeps all observations in x that have a match in y.
* anti_join(x, y) drops all observations in x that have a match in y. It is the inverse of a semi-join is an anti-join. An anti-join keeps the rows that don’t have a match. Anti-joins are useful for diagnosing join mismatches.
### 13.6 Join problems
### 13.7 Set operations
The final type of two-table verb are the set operations. Generally, I use these the least frequently, but they are occasionally useful when you want to break a single complex filter into simpler pieces. All these operations work with a complete row, comparing the values of every variable. These expect the x and y inputs to have the same variables, and treat the observations like sets:
* intersect(x, y): return only observations in both x and y.
* union(x, y): return unique observations in x and y.
* setdiff(x, y): return observations in x, but not in y.