-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathswim_outputs.qmd
248 lines (156 loc) · 8.31 KB
/
swim_outputs.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
---
title: "Seahtrue outputs"
webr:
packages: ['tidyverse']
editor:
mode: source
---
## Nested tibbles
The data output format of the `run_seahtrue` function is a list of lists. List of lists is also called nesting of data. The advantage of this is that the data is properly organized, but also easily accessible. Here is an example that I took from a `tidyr` vignette <https://tidyr.tidyverse.org/articles/nest.html>.
``` {webr-r}
library(tidyverse)
mtcars %>%
nest(.by = cyl)
```
You can see that the the data is now nicely organized by the `cylinder` parameter. Since there are only 3 different values for the cyl in the mtcars dataset, there are now three rows and two columns, one column has the `cyl` parameter all other data is nested into a `data` column.
::: {.callout-note collapse="true"}
## .by vs group_by
In one of the latest releases of the `tidyverse` the use of `.by` was introduced. Previously we used the `group_by` to tell R how to organize the data. The grouping of data remains attached to the data tibble, which sometimes could result in unintentional things to happen, when you forgot that the tibble was grouped. The `group_by` can be undone with the `ungroup` command.
With the `.by` the grouping is only apparent while using the function in which you use it as argument. `group_by` and `.by` are doing similar things so they can be used both.Let's have a look at how they work:
``` {webr-r}
#first do a complete summarize
mtcars %>%
summarize(meamn = mean(disp))
#second only summarize the disp for each cyl
mtcars %>%
summarize(mean = mean(disp), .by = cyl)
#alternatively you can use
mtcars %>%
group_by(cyl)%>%
summarize(mean = mean(disp))
```
If you `glimpse` the results of the two ways of using grouping above you will see that `group_by` is doing stuff to your data, that you might not want. In this case it turns the `mtcars` dataframe into a tibble, whereas the result of the `.by` in the `summarize` function is still a dataframe. Although it might not really matter whether your data is a tibble or dataframe, it shows that `group_by` is a bit more invasive on your data.
:::
You can use `pluck` to get to the nested `data`. Basically you just pluck a part of the data out of the full dataset.
``` {webr-r}
mtcars %>%
nest(.by = cyl) %>%
pluck("data", 1)
```
Please note that we use here `"data"` instead of `data`. It can be confusing when to use the `""` or not. For example, with the `pull` function which takes one full column out of a tibble, you are not using `""`.
Also, `pluck` uses indexing for retrieving its components, it is not possible to directly get the element that belongs to `cyl == 3` for example. You would need to `filter` first on that parameter and then `pluck` the first row of data.
## The purrr map function
The cool thing about a nested tibble is that you can quickly perform stuff on each nested tibble. A really good introduction to this is described in this blog post by Rebecca Barter <https://www.rebeccabarter.com/blog/2019-08-19_purrr>. You can map a function on each item from that row.
``` {webr-r}
mtcars %>%
nest(.by = cyl) %>%
mutate(model =
map(data, function(df) lm(mpg ~ wt, data = df)))
```
You see that a new column is generated named `model`, if you `pluck` the one of the models, you can see the typical output of the linear model (`lm`) function. For each cylinder now you creates a linear model!
``` {webr-r}
mtcars %>%
nest(.by = cyl) %>%
mutate(model =
map(data,
function(df) lm(mpg ~ wt, data = df))) %>%
pluck("model", 1)
```
The semantics and how to use the `map` function is nicely explained in the blog post that was referenced here above. But some more considerations here:
``` {webr-r}
#this is the original form
mtcars %>%
nest(.by = cyl) %>%
mutate(model =
map(data, function(df) lm(mpg ~ wt, data = df)))
#some of the parts are left out
mtcars %>%
nest(.by = cyl) %>%
mutate(model =
map(.x = data,
.f = function(df) lm(mpg ~ wt, data = df)))
#this makes it a bit more clear
#now .x is the data and .f is the function
#also you can change the function syntax
mtcars %>%
nest(.by = cyl) %>%
mutate(model = map(.x = data,
.f = ~ lm(mpg ~ wt, data = .x)))
#now function was replaced with ~
#and the df was replaced with .x
#this also works . instead of .x
mtcars %>%
nest(.by = cyl) %>%
mutate(model = map(.x = data,
.f = ~ lm(mpg ~ wt, data = .)))
```
Another good resource for the purrr map function is <https://dcl-prog.stanford.edu/purrr-basics.html>. `map` has many more forms and ways to use, which are summarized in its cheat sheet <https://github.com/rstudio/cheatsheets/blob/main/purrr.pdf>.
## The seahtrue ouput
Now go and have a look at the `run_seahtrue` output.
``` {webr-r}
library(tidyverse)
root_srcfile <-
"https://raw.githubusercontent.com/vcjdeboer/"
repository_srcfile <-
"seahtrue/develop-gerwin/data/"
download.file(
paste0(
root_srcfile,
repository_srcfile,
"seahtrue_output_donor_A.rda"),
"seahtrue_output_donor_A.rda")
load("seahtrue_output_donor_A.rda")
seahtrue_output_donor_A %>% glimpse()
```
Also pluck some of the data
``` {webr-r}
# get the original input filename and location
seahtrue_output_donor_A %>%
pluck("filepath_seahorse",1)
# get the injection info
seahtrue_output_donor_A %>%
pluck("injection_info",1)
# get the date when exp was run
seahtrue_output_donor_A %>%
pluck("date",1)
```
Some data are simple character strings, like the `date` column, whereas others are large tables like the `raw_data` column
With this loaded data (`seahtrue_output_donor_A`) you can now do similar plotting as in the `plotting seahorse` chapter. For this we only have to `pluck` the `rate_data` out of the data set. Be carefull that we preprocessed the data and we have other column names now so first `glimpse` the data.
``` {webr-r}
seahtrue_output_donor_A %>%
pluck("rate_data",1) %>%
glimpse()
#or get only the column names
seahtrue_output_donor_A %>%
colnames()
```
You will see that the column names are labeled with `wave`, in this way we can distinguish for example the time column in the `raw_data` tibble from the `time_wave` column in the `rate_data` tibble. Also, please notice that we have `OCR_wave_bc` and `OCR_wave`. This distinctino is made because we can have OCR data that is background corrected or not. When clicking on the background slider in the Wave software from Agilent, the OCR data will be changed to non background corrected. If at this point the data is exported the xlsx input file is not background corrected. In the `seahtrue` this will show up as `OCR_wave`. Typically however the data is background corrected, so we most of the time have `OCR_wave_bc`.
::: {.callout-note collapse="true"}
## time and time again
Since rate is an aggregate of mulitple O2 or pH readings, also the definition of the timing of each measurement is different between the `rate_data` and the `raw_data`. Therefore in the `seahtrue` package both times are labeled differently. For the `rate_table` we labeled it with `time_wave` and for the `raw_data` we labeled it with `timescale`. And again, we used `timescale` to distinguish it from the `time` in the original input file.
:::
Please note if we want to plot the OCR vs time, we have to use the `OCR_wave_bc` vs `time_wave` in our ggplot aesthetics.
It is good practice to have a quick look at how the groups were named in the experiment. We can use the `pull(group)` and `unique()` commands for this:
``` {webr-r}
#first have a look at what groups we have
seahtrue_output_donor_A %>%
pluck("rate_data",1) %>%
pull(group) %>% unique()
```
Next, take some of the groups and plot them in a ggplot:
``` {webr-r}
seahtrue_output_donor_A %>%
pluck("rate_data",1) %>%
filter(group %in% c("200.000", "Background")) %>%
ggplot(aes(x = time_wave, y = OCR_wave_bc,
group = well,
color = group))+
geom_point()+
geom_line() +
scale_color_brewer(palette = "Set1")+
labs(subtitle = "200.000 cells per well vs Background",
x = "time (minutes)",
y = "OCR (pmol/min)")+
theme_bw(base_size = 16)
```
Great, this looks exactly the same as the plot we generated using the data from the downloaded excel file in the "plotting seahorse" chapter.