-
Notifications
You must be signed in to change notification settings - Fork 0
/
flyestats.Rmd
123 lines (104 loc) · 2.91 KB
/
flyestats.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
title: "gather flye assembly statistics"
author: "Laura Dijkhuizen"
date: "21/03/2022"
output:
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Gather stats in long table with bash
```{bash}
# set working dir
cd /stor/anabaena
#get fq dir and samples from the main script.sh
fqdir=$(grep 'fqdir=' ./script.sh | cut -d '=' -f 2 | cut -d ' ' -f 1)
names=( $(find "$fqdir" -maxdepth 1 -name '*.fastq.gz' -printf '%P\n' | sed 's/.fastq.gz//g') )
echo "found sample names ${names[@]}"
# prep a table with headers
echo -e "samplename\tstatistic\tvalue" > denovo/flyestats.txt
# now gather stats in a table with a for loop
for n in ${names[@]}
do tail -n 8 ./denovo/$n/flye.log \
| head -n 6 \
| tr ' ' _ \
| tr -d ':' \
| sed "s/Fragment/Contig/g" \
| sed "s/^/$n/g"
done \
| sed -E 's/ +/\t/g' \
>> denovo/flyestats.txt
```
## Import flye stats in R
```{r}
setwd("/stor/anabaena")
library(data.table)
flye <- fread(file = "./denovo/flyestats.txt",header = T,colClasses = c('factor','factor','double'))
```
## Convert lengths to mb
with some data.table magic
```{r}
bp_to_mb <- c('Fragments_N50','Largest_frg','Total_length')
flye[statistic %in% bp_to_mb,
value := value/1000000
]
rm(bp_to_mb)
flye$value
```
## Make statistic a propper y axis label
```{r}
levels(flye$statistic) <- c("Contigs\n(count)",
"Contigs_N50\n(Mbase)",
"Largest_contig\n(Mbase)",
"Mean_coverage",
"Scaffolds",
"Total_length\n(Mbase)")
```
## Flye stats as (gg)plot
```{r}
library(ggplot2)
flyeplot <- ggplot(data = flye[statistic != 'Scaffolds'],
mapping = aes(
x=samplename,
y=value,
fill=samplename
))
flyeplot <- flyeplot + geom_bar(stat = 'identity')
flyeplot <- flyeplot + facet_grid( statistic ~ .,scales = "free_y")
flyeplot <- flyeplot + theme_bw()
flyeplot <- flyeplot + theme(strip.text.y = element_text(angle=0))
flyeplot <- flyeplot + scale_fill_brewer(type = 'qual',
palette = 2
)
flyeplot <- flyeplot + ylab(NULL)
flyeplot
```
## Flye stats as table
```{r}
library(kableExtra)
kable(
dcast(flye,
formula = samplename ~ statistic,
value.var = 'value'),
format = 'html',
align = 'c',
digits = 2) %>%
kable_minimal()
```
## assembly graphs
Here I link manually to snapshots of the assembly graphs:
### CSV15
![](denovo/CSV15.png)
### UU1-1
![](denovo/UU1-1.png)
### UU1-4
![](denovo/UU1-4.png)
### UU1-8
![](denovo/UU1-8.png)
### UU2-8
![](denovo/UU2-8.png)
### UU2-3
![](denovo/UU2-3.png)
### WT
![](denovo/WT.png)