-
Notifications
You must be signed in to change notification settings - Fork 3
1. Introduction
Automating the boring stuff during evolutionary analyses of genetic data in phylogenomics & phylogeography...
PIrANHA v0.4a4 is a repository of functions that is designed to help automate file processing and analysis of DNA sequence or SNP data in phylogenetics and phylogeography research projects using traditional data (Avise 2000; Felsenstein 2004) or data from phylogenomics and population genomics (e.g. Kapli et al. 2020). PIrANHA is fully command line-based and, rather than being structured as a single pipeline, it contains a series of functions, some of which run pipelines or analysis routines, for aiding or completing tasks during evolutionary analyses of genetic data. Some of these functions conduct custom analyses, while many of them are wrappers around existing software, allowing for rapid and time-saving automation of common analyses in evolutionary genetics. Currently, PIrANHA facilitates running, linking, or analyzing output from the following software programs:
-
BEAST
(Drummond et al. 2012; Bouckaert et al. 2014) -
∂a∂i
(Gutenkunst et al. 2009) -
ExaBayes
(Aberer et al. 2014) -
fastSTRUCTURE
(Raj et al. 2014) -
iqtree
(Nguyen et al. 2015; Minh et al. 2020) -
MrBayes
(Ronquist et al. 2012) -
PartitionFinder
(Lanfear et al. 2012, 2016) -
PhyloMapper
(Lemmon and Lemmon 2008) -
pyRAD
(Eaton 2014) oripyrad
(Eaton and Overcast 2016) -
RAxML
(Stamatakis 2014) -
RogueNaRok
(Aberer et al. 2013) -
starBEAST
(Heled & Drummond 2010) -
SNAPP
(Bryant et al. 2012)
Regarding its file processing capabilities, PIrANHA is extremely useful for file format conversions, which are commonplace in phylogenetics and phylogeography workflows. Several different functions are available to easily automate conversion between DNA sequence alignment formats (Table 1 below), usually starting from NEXUS or PHYLIP multiple sequence alignment (MSA) files. PIrANHA also contains functions for extracting taxon names from alignments, renaming taxa, and splitting PHYLIP alignments. In addition to alignment files, PIrANHA also deals with variant call format (VCF), with current capabilities including subsampling VCF files and converting from FASTA multiple sequence alignments to VCF format.
Table 1: Main input file types with file format conversion functions available in PIrANHA.
Input file types | Extension(s) | Info |
---|---|---|
NEXUS | '.nex' (preferred), '.NEX' | link, link |
PHYLIP | '.phy' (preferred) | link, link |
FASTA | '.fas' (preferred), '.FAS', '.fasta' | link, link |
Mega | '.meg' | link |
Variant call format (VCF) | '.vcf' | v4.0+, link |
The current code in PIrANHA has been written largely with a focus on 1) analyses of DNA sequence data and SNPs or SNP loci generated from massively parallel sequencing runs on different genome-reduction-type libraries including ddRAD-seq genomic libraries (e.g. Peterson et al. 2012) and ultraconserved elements (UCEs; Faircloth ), and 2) automating running these software programs on the user's personal machine (e.g. MAGNET pipeline and pyRAD2PartitionFinder scripts) or a remote supercomputer machine. Several functions are also designed specifically for post-processing of the results of phylogenetic analyses. In particular, a number of functions have been written with sections allowing them to be run (or cause other software to be called) on a supercomputing cluster, using code suitable for SLURM or TORQUE (PBS; Portable Batch System) resource management systems.
- Aberer, A., Krompass, D., Stamatakis, A. 2013. Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Systematic Biology 62(1), 162–166.
- Aberer, A.J., Kobert, K., Stamatakis, A. 2014. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Molecular Biology and Evolution 31, 2553-2556.
- Avise, J.C. 2000. Phylogeography: the history and formation of species. Cambridge, MA: Harvard University Press.
- Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M.A., Alekseyenko, A.V. 2012. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Molecular Biology and Evolution 29, 2157-2167.
- Bouckaert, R., Heled, J., Künert, D., Vaughan, T.G., Wu, C.H., Xie, D., Suchard, M.A., Rambaut, A., Drummond, A.J. 2014. BEAST2: a software platform for Bayesian evolutionary analysis. PLoS Computational Biology 10, e1003537.
- Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A., RoyChoudhury, A. 2012. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29, 1917–1932.
- Eaton, D.A. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30, 1844-1849.
- Eaton, D.A.R., Overcast, I. 2016. ipyrad: interactive assembly and analysis of RADseq data sets. Available at: http://ipyrad.readthedocs.io/.
- Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A. 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29, 1969-1973.
- Felsenstein, J. 2004. Inferring phylogenies. Sunderland, MA: Sinauer Associates.
- Heled, J., Drummond, A.J. 2010. Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution 27, 570–580.
- Kapli, P., Yang, Z., Telford, M.J., 2020. Phylogenetic tree building in the genomic age. Nature Reviews Genetics, 1-17.
- Lanfear, R., Calcott, B., Ho, S.Y.W., Guindon, S. 2012. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29, 1695-1701.
- Lanfear, R., Frandsen, P.B., Wright, A.M., Senfeld, T., Calcott, B. 2016. PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular Biology and Evolution.
- Lemmon, A.R., Lemmon, E. 2008. A likelihood framework for estimating phylogeographic history on a continuous landscape. Systematic Biology 57, 544–561.
- Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D., Von Haeseler, A., Lanfear, R., 2020. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution 37(5), 1530-1534.
- Nguyen, L.T., Schmidt, H.A., Von Haeseler, A., Minh, B.Q., 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32(1), 268-274.
- Peterson, B.K., Weber, J.N., Kay, E.H., Fisher, H.S., Hoekstra, H.E. 2012. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One 7, e37135.
- Raj, A., Stephens, M., Pritchard, J.K. 2014. fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets. Genetics 197, 573-589.
- Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D., Darling, A., et al. 2012. MrBayes v. 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology 61, 539-542.
- Stamatakis, A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312-1313.
December 26, 2020 - Justin C. Bagley, Jacksonville, AL, USA