-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
126 lines (82 loc) · 3.96 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "sitePath: phylogeny-based sequence clustering using site polymorphism"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
fig.path = "man/figures/"
)
```
The below demonstrates the result of phylogeny-based sequence clustering for a H3N2 virus dataset (included in the package)
```{r example}
library(sitePath)
data(h3n2_align) # load the H3N2 sequences
data(h3n2_tree) # load the corresponding phylogenetic tree
options(list("cl.cores" = 10)) # Use 10 cores for multiprocessing
paths <- lineagePath(h3n2_tree, alignment = h3n2_align, Nmin = 0.05)
minEntropy <- sitesMinEntropy(paths)
p1 <- plotSingleSite(paths, site = 208) # The site polymorphism of site 208 on the tree
p2 <- plotSingleSite(minEntropy, site = 208) # The result of clustering using site 208
gridExtra::grid.arrange(p1, p2, ncol = 2)
```
```{r extractTips}
grp1 <- extractTips(paths, 208) # Grouping result using site polymorphism only
grp2 <- extractTips(minEntropy, 208) # Phylogeny-based clustering result
```
# Installation
[R programming language](https://cran.r-project.org/) \>= 4.1.0 is required to use `sitePath`.
The stable release is available on [Bioconductor](https://bioconductor.org/packages/sitePath/).
``` r
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("sitePath")
```
The installation from [GitHub](https://github.com/wuaipinglab/sitePath/) is in experimental stage but gives the newest feature:
``` r
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("wuaipinglab/sitePath")
```
# QuickStart
The following is a quick tutorial on how to use `sitePath` to find fixation and parallel sites including how to import data, run analysis and visualization of the results.
## 1. Data preparation
You need a *tree* and a *MSA* (multiple sequence alignment) file and the sequence names have to be matched!
```{r data_prep}
library(sitePath) # Load the sitePath package
# The path to your tree and MSA files
tree_file <- system.file("extdata", "ZIKV.newick", package = "sitePath")
alignment_file <- system.file("extdata", "ZIKV.fasta", package = "sitePath")
tree <- read.tree(tree_file) # Read the tree file into R
align <- read.alignment(alignment_file, format = "fasta") # Read the MSA file into R
```
## 2. Run analysis
`Nmin` and `minSNP` are the respective parameters for finding fixation and parallel sites (18 and 1 are used as an example for this dataset). The default values will be used if you don't specify them.
```{r run_analysis}
options(list("cl.cores" = 1)) # Set this bigger than 1 to use multiprocessing
paraFix <- paraFixSites(tree, alignment = align, Nmin = 18, minSNP = 1) # Find paraFix sites
paraFix
```
## 3. Fixation sites
Use `allSitesName` and set `type` as "fixation" to retrieve fixation sites name
```{r fixSites_name}
allSitesName(paraFix, type = "fixation")
```
Use `plotFixationSites` to view fixation sites
```{r plot_fixSites}
plotFixationSites(paraFix) # View all fixation sites on the tree
plotFixationSites(paraFix, site = 139) # View a single site
```
## 4. Parallel sites
Use `allSitesName` and set `type` as "parallel" to retrieve parallel sites name
```{r paraSites_name}
allSitesName(paraFix, type = "parallel")
```
Use `plotParallelSites` to view parallel sites
```{r}
plotParallelSites(paraFix) # View all parallel sites on the tree
plotParallelSites(paraFix, site = 105) # View a single site
```
# Read more
The above uses wrapper functions but the analysis can be dissembled into step functions (so you can view the result of each step and modify parameters). Click [here](https://wuaipinglab.github.io/sitePath/articles/sitePath.html) for a more detailed tutorial.
# Getting help
Post on Bioconductor [support site](https://support.bioconductor.org/) if having trouble using `sitePath`. Or open an [issue](https://github.com/wuaipinglab/sitePath/issues/new?assignees=&labels=&template=bug_report.md&title=) if a bug is found.