-
Notifications
You must be signed in to change notification settings - Fork 3
1. Introduction
Automating the boring stuff during evolutionary analyses of genetic data in phylogenomics & phylogeography...
PIrANHA v0.4a3 is a repository of functions that is designed to help automate file processing and analysis of DNA sequence or SNP data in phylogenetics and phylogeography research projects (Avise 2000; Felsenstein 2004). PIrANHA is fully command line-based and, rather than being structured as a single pipeline, it contains a series of functions, some of which run pipelines or analysis routines, for aiding or completing tasks during evolutionary analyses of genetic data. Some of these functions conduct custom analyses, while many of them are wrappers around existing software, allowing for rapid automation of common analyses in evolutionary genetics. Currently, PIrANHA facilitates running or linking the following software programs:
-
pyRAD
(Eaton 2014) oripyrad
(Eaton and Overcast 2016) -
PartitionFinder
(Lanfear et al. 2012, 2016) -
BEAST
(Drummond et al. 2012; Bouckaert et al. 2014) -
starBEAST
(Heled & Drummond 2010) -
SNAPP
(Bryant et al. 2012) -
MrBayes
(Ronquist et al. 2012) -
ExaBayes
(Aberer et al. 2014) -
RAxML
(Stamatakis 2014) -
∂a∂i
(Gutenkunst et al. 2009) -
fastSTRUCTURE
(Raj et al. 2014) -
PhyloMapper
(Lemmon and Lemmon 2008) -
RogueNaRok
(Aberer et al. 2013)
Regarding its file processing capabilities, PIrANHA is extremely useful for file format conversions, which are commonplace in phylogenetics and phylogeography workflows. Several different functions are available to easily automate conversion between DNA sequence alignment formats (Table 1 below), usually starting from NEXUS or PHYLIP multiple sequence alignment (MSA) files. PIrANHA also contains functions for extracting taxon names from alignments, renaming taxa, and splitting PHYLIP alignments. In addition to alignment files, PIrANHA also deals with variant call format (VCF), with current capabilities including subsampling VCF files and converting from FASTA multiple sequence alignments to VCF format.
Table 1: Main input file types with file format conversion functions available in PIrANHA.
Input file types | Extension(s) | Info |
---|---|---|
NEXUS | '.nex' (preferred), '.NEX' | link, link |
PHYLIP | '.phy' (preferred) | link, link |
FASTA | '.fas' (preferred), '.FAS', '.fasta' | link, link |
Mega | '.meg' | link |
Variant call format (VCF) | '.vcf' | v4.0+, link |
The current code in PIrANHA has been written largely with a focus on 1) analyses of DNA sequence data and SNPs or SNP loci generated from massively parallel sequencing runs on different genome-reduction-type libraries including ddRAD-seq genomic libraries (e.g. Peterson et al. 2012) and ultraconserved elements (UCEs; Faircloth ), and 2) automating running these software programs on the user's personal machine (e.g. MAGNET pipeline and pyRAD2PartitionFinder scripts) or a remote supercomputer machine. Several functions are also designed specifically for post-processing of the results of phylogenetic analyses. In particular, a number of functions have been written with sections allowing them to be run (or cause other software to be called) on a supercomputing cluster, using code suitable for SLURM or TORQUE (PBS; Portable Batch System) resource management systems.
- Aberer A, Krompass D, Stamatakis A (2013) Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Systematic Biology, 62(1), 162–166.
- Aberer AJ, Kobert K, Stamatakis A (2014) ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Molecular Biology and Evolution, 31, 2553-2556.
- Avise JC (2000) Phylogeography: the history and formation of species. Cambridge, MA: Harvard University Press.
- Baele G, Lemey P, Bedford T, Rambaut A, Suchard MA, Alekseyenko AV (2012) Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Molecular Biology and Evolution, 29, 2157-2167.
- Bouckaert R, Heled J, Künert D, Vaughan TG, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014) BEAST2: a software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 10, e1003537.
- Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, RoyChoudhury A (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution, 29, 1917–1932.
- Eaton DA (2014) PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics, 30, 1844-1849.
- Eaton DAR, Overcast I (2016) ipyrad: interactive assembly and analysis of RADseq data sets. Available at: http://ipyrad.readthedocs.io/.
- Drummond AJ, Suchard MA, Xie D, Rambaut A (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, 29, 1969-1973.
- Felsenstein J (2004) Inferring phylogenies. Sunderland, MA: Sinauer Associates.
- Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution, 27, 570–580.
- Lanfear R, Calcott B, Ho SYW, Guindon S (2012) PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution, 29,1695-1701.
- Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B (2016) PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular Biology and Evolution.
- Lemmon AR, Lemmon E (2008) A likelihood framework for estimating phylogeographic history on a continuous landscape. Systematic Biology, 57, 544–561.
- Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One, 7, e37135.
- Raj A, Stephens M, and Pritchard JK (2014) fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets. Genetics, 197, 573-589.
- Ronquist F, Teslenko M, van der Mark P, Ayres D, Darling A, et al. (2012) MrBayes v. 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology, 61, 539-542.
- Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30, 1312-1313.
December 9, 2020 Justin C. Bagley, Jacksonville, AL, USA