-
Notifications
You must be signed in to change notification settings - Fork 1
/
00_01_2_IntroToR_practical.qmd
183 lines (114 loc) · 4.5 KB
/
00_01_2_IntroToR_practical.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
title: "P01. Introduction to R, part 2"
---
**A. read in a data file**
```{r eval = F}
myTBdata <- read.table("TB_stats.txt", header=TRUE)
```
In order to run this, your computer need to know where "TB_stats.txt" is. You could download it [here](data/TB_stats.txt).
What does the "header=TRUE" option mean?
Answer:
**B. Have a look at the first few lines**
```{r eval = F}
# Now let's investigate the data file
head(myTBdata)
```
How many rows can you see? What is the first row?
Answer:
**C. What are the names of the columns?**
```{r eval = F}
names(myTBdata)
```
Is this what you expected?
Answer:
D. How many rows and columns are there in your data ?
```{r eval = F}
dim(myTBdata)
```
What is the first number telling you? And the second?
Answer:
**E. How are your data stored?**
```{r eval = F}
attributes(myTBdata)
```
What new piece of information have you learned from the 'attributes()' function?
Answer:
**F. Now take a look at some summary statistics for your data**
```{r eval = F}
summary(myTBdata)
```
Let's extract some information from our data
**G. First, Calculate the total number of deaths across all countries.**
The following two methods should give you the same answer
```{r eval = F}
total_TB_mortality1 <- sum(myTBdata[,2:3]) # method 1
total_TB_mortality2 <- sum(myTBdata$HIV_pos_TB_mortality + myTBdata$HIV_neg_TB_mortality) # method 2
```
Do you think one method is better than the other?
Answer:
**H. Now let's check that both methods give the same answer.**
We'll use two ways to check this. First, let's output both answers
```{r eval = F}
total_TB_mortality1
total_TB_mortality2
```
Now, let's ask R to check whether they are both equal
```{r eval = F}
total_TB_mortality1==total_TB_mortality2
# logical expression which gives TRUE if equal and FALSE if not
```
Why might you prefer to use the second check (using the logical expression) than the first?
Answer:
**I. How different is the TB mortality rate in HIV positive persons in Lesotho compared to Zimbabwe?**
First, let's add "mortality rate" as another column in our data frame
```{r eval = F}
myTBdata$Mortality_Per1000 <- 1000 * (myTBdata$HIV_pos_TB_mortality + myTBdata$HIV_neg_TB_mortality)/myTBdata$Population
```
Now subset the dataset to extract the TB mortality rate for both Lesotho and Zimbabwe
```{r eval = F}
Lesotho_mortalityrate <- myTBdata[myTBdata$Country=="Lesotho", "Mortality_Per1000"]
Zimbabwe_mortalityrate <- myTBdata[myTBdata$Country=="Zimbabwe", "Mortality_Per1000"]
Relative_Mortality_Rate <- Lesotho_mortalityrate / Zimbabwe_mortalityrate
```
How many times higher is the mortality rate for TB in Lesotho as it is in Zimbabwe?
```{r eval = F}
paste("The relative mortality rate is", round(Relative_Mortality_Rate, 2), sep=" ")
```
**J. Finally in this section, let's look at what can go wrong when reading in data files.**
In order to complete this section, you would need to download [readfileexample_1.txt](data/readfileexample_1.txt), [readfileexample_2.txt](data/readfileexample_2.txt), and [readfileexample_3.txt](data/readfileexample_3.txt).
\(a\) There is not an equal number of columns in each of the rows.
```{r eval = F}
readFile_a <- read.table("readfileexample_1.txt", header=TRUE)
```
How do you fix this error? Hint: set missing values in the data file to be 'Not Assigned' by adding them as NA in the original file. Try running this line again with the updated file.
\(b\) The wrong delimiter is used
```{r eval = F}
readFile_b <- read.table("readfileexample_2.txt", header=TRUE)
```
Is an error given? Check out 'readFile_b' - is it correct?
Answer:
How do you fix this? Ask R for help (?read.table) Which option do you need to specify?
Answer:
Is there another way of fixing this problem?
Answer:
\(c\) The names are read in as data rows rather than names
```{r eval = F}
readFile_c <- read.csv("readfileexample_2.txt", header=FALSE)
```
Is an error given? Check out 'readFile_c' - is it correct?
Answer:
Type a new line of code to correct this problem (hint: copy-paste from above and change one of the options)
```{r}
##### YOUR CODE GOES HERE #####
```
\(d\) One of more of the columns contain different classes
```{r eval = F}
readFile_d <- read.table("readfileexample_3.txt", header=TRUE)
```
Is an error given? Check out 'readFile_d' - is it correct?
Answer:
How do you fix this issue? Hint: check the 'class' of the problem column. Try running this line again with an updated file.
```{r}
##### YOUR CODE GOES HERE #####
```
The solutions can be accessed [here](00_01_2_IntroToR_solutions.qmd).