-
Notifications
You must be signed in to change notification settings - Fork 0
/
Chapter_12.Rmd
94 lines (69 loc) · 2.81 KB
/
Chapter_12.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: "Chapter 12"
author: "Laura"
date: "11/25/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
library(tidyverse); library(skimr); library(nycflights13); library(GGally); library(ggstance); library(lvplot); library(hexbin); library(modelr); library(magrittr)
```
## Notes for Chapter 12 Tidy data
### 12.2 Tidy data
There are three interrelated rules which make a dataset tidy:
1. Each variable must have its own column.
2. Each observation must have its own row.
3. Each value must have its own cell.
#### 12.3.3 Exercises
```{r ex12333}
people <- tribble(
~name, ~key, ~value,
#-----------------|--------|------
"Phillip Woods", "age", 45,
"Phillip Woods", "height", 186,
"Phillip Woods", "age1", 50,
"Jessica Cordero", "age", 37,
"Jessica Cordero", "height", 156
)
people
tidypeople <- people %>%
spread(key = key, value = value)
```
#### 12.4.1 Separate
```{r ch1241}
table3 %<>%
mutate(bla = rep("bla;", 6))
table3 %>%
separate(bla, into = c("newbla", "garbage")) %>% #, remove = FALSE)) %>%
select(-garbage) %>%
rename(bla = newbla)
(tibble(x = c("a,b,c", "d,e,f,g", "h,i,j"))) %>%
separate(x, c("one", "two", "three", "four"))
tibble(x = c("a,b,c", "d,e", "f,g,i")) %>%
separate(x, c("one", "two", "three"), fill = "right")
```
### 12.6 Case Study
The WHO dataset: The data uses the original codes given by the World Health Organization. The column names for columns five through 60 are made by combining new_ to a code for method of diagnosis (rel = relapse, sn = negative pulmonary smear, sp = positive pulmonary smear, ep = extrapulmonary) to a code for gender (f = female, m = male) to a code for age group (014 = 0-14 yrs of age, 1524 = 15-24 years of age, 2534 = 25 to 34 years of age, 3544 = 35 to 44 years of age, 4554 = 45 to 54 years of age, 5564 = 55 to 64 years of age, 65 = 65 years of age or older).
```{r ch1261}
who
who_longer <- who %>%
pivot_longer(5:60, names_to = "key",
values_to = "case_count") %>%
separate(key, c("new", "rest"), sep =3) %>%
select(-new) %>%
separate(rest, c("diagnosis", "genderage"), sep = 3) %>%
separate(genderage, c("gender", "age"), sep = 2) %>%
mutate(diagnosis = case_when(diagnosis == "_sp" ~ "possmear",
diagnosis == "_np" ~ "negsmear",
diagnosis == "rel" ~ "relapse",
diagnosis == "_ep" ~ "extrapul"),
gender = case_when(gender == "_f" ~ "female",
gender == "_m" ~ "male"))
# this plot is fu
who_longer %>%
group_by(country, year, gender) %>%
filter(case_count > 1 & year > 1990) %>%
ggplot(aes(year, case_count, color = gender)) +
geom_line()
glimpse(who_longer)
```