Skip to content

3. Getting Started

Justin C. Bagley edited this page Dec 27, 2020 · 67 revisions

Detailed Start Guide

It is easy to get started using PIrANHA. If you're in a hurry, see the Quick Guide for the Impatient on the Wiki Home page.

In the sections below, detailed background information and starting instructions are given for handling dependencies, installation, usage, functions, and input/output file formats, etc.

Dependencies

Dependencies for each PIrANHA function are available in the help texts for the function, accessed by piranha -f <function> -h. You can usually get away with installing only software needed for the analysis/function you are currently running with PIrANHA, thus avoiding installing dependencies or software that are unrelated to your current workflow. However, it is recommended that each user (eventually) installs all dependencies to take full advantage of PIrANHA's capabilities, in order to be prepared for any analysis!

The main dependencies for PIrANHA are Perl and utility software such (e.g. grep, stream editor sed), which typically come pre-installed on most UNIX/LINUX systems. Thus, for many functions, the user will not need to install any dependency software at all, especially for DNA sequence alignment conversions and simple operations.

However, some PIrANHA functions, and especially the MAGNET pipeline (here or here) within PIrANHA, rely on several software dependencies. These dependencies are described in the help texts for different functions (see above); however, I provide a full list of them below, with asterisk marks preceding those already included in the "MAGNET-1.2.0" subdirectory of the current release.

  • PartitionFinder
  • BEAST v1.8.3++ and v2.4.2++ (or newer; available here and here)
    • Updated Java, appropriate Java virtual machine / jdk required
    • BEAGLE in beagle-lib (libhmsbeagle* files) required
    • default BEAST packages required
    • SNAPP package addon required
  • MrBayes v3.2++ (available here)
  • ExaBayes (available here)
  • RAxML (available here)
  • Perl v5.1+ (available here)
  • *Nayoki Takebayashi's file conversion Perl scripts (here; possibly available here; note: some, but not all of these, come packaged within MAGNET)
  • Python v2.7 and/or 3++ (available here)
    • Numpy (available here)
    • Scipy (available here)
    • Cython (available here)
    • GNU Scientific Library (available here)
    • bioscripts.convert v0.4 Python package (available here; also see README for 'NEXUS2gphocs.sh')
  • fastSTRUCTURE v1.0 (available here)
  • ∂a∂i v1.7.0++ (or v1.6.3; available here)
  • R v3++ (available here)

Users must install all software not included in PIrANHA, and ensure that it is available via the command line on their supercomputer and/or local machine (best practice is to simply install all software in both places). For more details, see the MAGNET README.

Installation

💻 As its functions are primarily composed of UNIX shell scripts and customized R scripts, PIrANHA is well suited for running on a variety of machine types, especially UNIX/LINUX-like systems that are now commonplace in personal computing and dedicated supercomputer cluster facilities. The UNIX shell is common to all Linux systems and macOS. Using PIrANHA is thus very straightforward, because these systems come with the shell preinstalled.

Another factor making PIrANHA easy to use is that I recently added a homebrew tap for PIrANHA. This allows quick and painless installation or updating from the command line on macOS and Linux, using only a couple of lines of code:

Install

brew tap justincbagley/homebrew-tap ;
brew update ;
brew install --HEAD piranha ;
piranha -i ;

It is a good idea to run source ~/.bash_profile to restart your shell environment after install, then check to make sure PIrANHA is available from the cli by typing piranha into Terminal and hitting enter.

The install code will install PIrANHA in your homebrew cellar, typically located at /usr/local/Cellar/, and homebrew will also place the PIrANHA executable, piranha (which controls the actual main script piranha.sh) in your path and add file execution permissions to all of its function scripts. The last line under Install is optional, useful just in case dynamic tab completion isn't immediately available in your terminal after install.

Update

brew upgrade --fetch-HEAD piranha

Use this upgrade code to check for and update to a new version of PIrANHA, including the latest commits, if available. It takes less than a minute to run this upgrade, which will also cause Homebrew itself to be updated. Here is an example from updating to "head" (latest cutting-edge development release) from v0.4a3:

$ brew upgrade --fetch-HEAD piranha
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/core and justincbagley/tap).
==> Updated Formulae
Updated 17 formulae.

==> Upgrading 1 outdated package:
justincbagley/tap/piranha HEAD-cb5ac8d -> HEAD-ae7acef
==> Upgrading justincbagley/tap/piranha HEAD-cb5ac8d -> HEAD-ae7acef 
==> Cloning https://github.com/justincbagley/piranha.git
Updating /Users/justinbagley/Library/Caches/Homebrew/piranha--git
==> Checking out branch master
Already on 'master'
Your branch is up to date with 'origin/master'.
HEAD is now at ae7acef Updating lib/ files
Warning: A newer Command Line Tools release is available.
Update them from Software Update in System Preferences or run:
  softwareupdate --all --install --force

If that doesn't show you an update run:
  sudo rm -rf /Library/Developer/CommandLineTools
  sudo xcode-select --install

Alternatively, manually download them from:
  https://developer.apple.com/download/more/.

==> rm /usr/local/etc/local_piranha
==> rm /usr/local/etc/brew_piranha
==> chmod +x /usr/local/Cellar/piranha/HEAD-ae7acef/bin/piranha
==> chmod +x /usr/local/Cellar/piranha/HEAD-ae7acef/bin/source_piranha_compl.sh
==> bash source_piranha_compl.sh
==> Caveats
    One line was added to your ~/.bash_profile to make dynamic tab completion of function names 
      available on the command line while running piranha.
    It will still be there after an uninstall, but is adaptive (nothing happens if piranha was uninstalled).
    If you're a zsh person, then patches are welcome: https://github.com/justincbagley/piranha/blob/master/completions/source_piranha_compl.sh
==> Summary
🍺  /usr/local/Cellar/piranha/HEAD-ae7acef: 134 files, 4.7MB, built in 2 seconds
Removing: /usr/local/Cellar/piranha/HEAD-cb5ac8d... (134 files, 5.0MB)

Usage

In general form, the usage for PIrANHA is to call the main function piranha as follows:

piranha [OPTION]... [FILE]...

where [OPTION] will usually include the mandatory function (-f flag) and arguments for that function, which are simply passed after the function call.

Some specific usage examples are:

piranha -h                                   Show piranha help text and exit
piranha -f list                              Get list of available functions
piranha -f <function> -h                     Show help text for <function> and exit
piranha -f calcAlignmentPIS -h               Show help text for calcAlignmentPIS function and exit
piranha -f calcAlignmentPIS -t 150           Run calcAlignmentPIS with threshold at N=150 alignments

As noted under the Dependencies section above, obtain the full help text for PIrANHA can be obtained using piranha -h, and is as follows:


piranha v1.1.8, December 2020  (main script for PIrANHA v0.4a4, update Dec 26 22:53:10 CST 2020)                    
Copyright (c) 2019-2020 Justin C. Bagley. All rights reserved.                                            
----------------------------------------------------------------------------------------------------------
piranha.sh [OPTION]... [FILE]...

 This is the main script for PIrANHA v0.4a4 (update Dec 26 22:53:10 CST 2020).

 Options:
  -s, --shortlist   Short list of available functions
  -f, --func        Function, <function>
  -a, --args        Function arguments passed to <function>
  -q, --quiet       Quiet (no output)
  -l, --log         Print log to file
  -v, --verbose     Output more information (items echoed to 'verbose')
  -d, --debug       Runs script in Bash debug mode (set -x)
  -h, --help        Display this help and exit
  -V, --version     Output version information and exit

 OVERVIEW
 THIS SCRIPT is the 'master' script that runs the PIrANHA software package by specifying 
 the <function> to be run (-f flag) and passing user-specified arguments to that function. 
 If no function or arguments are given, then the program prints the help text and exits.
	Functions are located in the bin/ folder of the PIrANHA distribution. For detailed 
 information on the capabilities of PIrANHA, please refer to documentation posted on the 
 PIrANHA Wiki (https://github.com/justincbagley/piranha/wiki) or the PIrANHA website
 (https://justinbagley.org/piranha/). Developers can test prianha and its functions by 
 activating Bash debug mode (-d, --debug flags).

 Usage examples:
    piranha -h                                   Show piranha help text and exit
    piranha -f <TAB>                             Get short list of available functions by dynamic completion
    piranha -f list                              Get detailed list of available functions by function
    piranha -f <function> -h                     Show help text for <function> and exit
    piranha -f <function> <args>                 Run <function> script with arguments (e.g. options flags)
    piranha -f <function> <args> -d              Run <function> script in Bash debug mode

 CITATION
 Bagley, J.C. 2020. PIrANHA v0.4a4. GitHub repository, Available at:
	<https://github.com/justincbagley/piranha>.

 Created by Justin Bagley on Fri, Mar 8 12:43:12 CST 2019.
 Copyright (c) 2019-2020 Justin C. Bagley. All rights reserved.

Functions

PIrANHA currently implements the following 58 functions, shown in Table 2 below.

Table 2: PIrANHA functions.

FUNCTION Description
2logeB10.r Rscript extracting marginal likelihood estimates and calculate 2loge B10 Bayes factors (2loge(B10)) from BEAST marginal likelihood estimation (ps / ss) runs.
alignAlleles Aligns and cleans allele sequences (phased DNA sequences) output by the PIrANHA function phaseAlleles (or in similar format; see phaseAlleles and alignAlleles usage texts for additional details).
AnouraNEXUSPrepper In-house function for preparing NEXUS files for Anoura UCE project analyses (Calderon, Bagley, and Muchhala, in prep.).
batchRunFolders Automates splitting a set of input files into different batches (to be run in parallel on a remote supercomputing cluster, or a local machine), starting from file type or list of input files.
BEASTPostProc Conducts post-processing of gene trees and species trees output by BEAST (e.g. Drummond et al. 2012; Bouckaert et al. 2014; usually on a remote supercomputer).
BEASTReset Function that resets the random seeds for n shell queue scripts corresponding to n BEAST runs/subfolders (destined for supercomputer).
BEASTRunner Automates running BEAST XML input files on a remote supercomputing cluster.
BEAST_PSPrepper Function that automates prepping BEAST XML input files for path sampling (marginal likelihood estimation) analyses using BEAST v2+ PathSampler.
calcAlignmentPIS Generates and runs custom Rscript (phyloch wrapper) to calculate number of parsimony-informative sites (pis) for all FASTA files in working dir.
completeConcatSeqs Function converting series of PHYLIP (Felsenstein 2002) DNA sequence alignments (with or without varying nos. of taxa) into a single concatenated PHYLIP alignment with complete taxon sampling; also makes character subset/partition files in RAxML, PartitionFinder, and NEXUS formats for the resulting alignment.
completeSeqs Function converting series of PHYLIP DNA sequence alignments (with or without varying nos. of taxa) into a single concatenated PHYLIP alignment with complete taxon sampling, starting from a 'taxon names and spaces' file.
concatenateSeqs Function that converts series of PHYLIP DNA sequence alignments with equal taxon sampling into a single concatenated PHYLIP alignment.
concatSeqsPartitions Function similar to concatenateSeqs, but which, in addition to concatenating the set of PHYLIP alignments, also outputs character subset/partitions files in RAxML, PartitionFinder, and NEXUS formats. This function differs from completeConcatSeqs in only taking alignments with equal taxon sampling and in being slightly faster in this usage case.
dadiPostProc Function for post-processing output from one or multiple ∂a∂i runs (ideally run with PIrANHA's dadiRunner function), including collation of best-fit parameter estimates, composite likelihoods, and optimal theta values.
dadiRunner Automates running ∂a∂i on a remote supercomputing cluster. See help text (-h) and function (bin/ dir) for details.
dadiUncertainty Automates uncertainty analysis in ∂a∂i, including generation of bootstrapped SNP files for parameter std. dev. estimation using the GIM method, as well as std. dev. estimation using the FIM method (orig. data only).
dropRandomHap This function randomly drops one phased haplotype (allele) per individual in each of n PHYLIP gene alignments in current working directory, starting from a 'taxon names' file.
dropTaxa Shell script automating removal of taxa from sequential, multi-individual FASTA or PHYLIP DNA sequence alignments, starting from a list of taxa to remove.
ExaBayesPostProc Function automating reading and conducting post-processing analyses on phylogenetic results output from ExaBayes.
FASTA2PHYLIP Function that automates converting one or multiple sequential FASTA DNA sequence alignment files (with sequences either unwrapped or hard-wrapped across multiple lines) to PHYLIP format.
FASTA2VCF Shell script function automating conversion of single multiple sequence FASTA alignment to variant call format (VCF) v4.1, with or without subsampling SNPs per partition/locus.
fastSTRUCTURE Interactive function that automates running fastSTRUCTURE (Raj et al. 2014) on biallelic SNP datasets.
geneCounter Shell script function that counts and summarizes the number of gene copies per tip species in a set of gene trees in Newick format (concatenated into a single trees file), given a taxon-species assignment file.
getBootTrees Function that automates organizing bootstrap trees output by RAxML runs conducted in current working directory using the MAGNET program within PIrANHA.
getDropTaxa Function to create drop taxon list given lists of a) all taxa and b) a subset of taxa to keep.
getTaxonNames Utility function that extracts tip taxon names from sequences present in one or multiple PHYLIP DNA sequence alignments in current directory, using information on maximum taxon sampling level from user.
iqtreePostProc Function that automates post-processing of gene tree files and log files output during phylogenetic analyses in IQ-TREE v1 or v2 (Nguyen et al. 2015; Minh et al. 2020).
indexBAM [In prep.]
list Function that prints a tabulated list of PIrANHA functions and their descriptions.
MAGNET Shell pipeline for automating estimation of a maximum-likelihood (ML) gene tree in RAxML for each of many loci in a RAD-seq, UCE, or other multilocus dataset. Also contains other tools.
makePartitions Function using PHYLIP DNA sequence alignments in current directory to make partitions/charsets files in RAxML, PartitionFinder, and NEXUS formats, which are output to separate files.
Mega2PHYLIP Automates converting one or more multiple sequence alignment files in Mega format (Mega v7+ or X; Kumar et al. 2016, 2018) to PHYLIP format (Felsenstein 2002), while saving (-k 1) or writing over (-k 0) the original Mega files.
mergeBAM [In prep.]
MLEResultsProc Automates post-processing of marginal likelihood estimation (MLE) results from running path sampling (ps) or stepping-stone (ss) sampling analyses on different models in BEAST.
MrBayesPostProc Simple script for post-processing results of a MrBayes v3.2+ (Ronquist et al. 2012) run, whose output files are assumed to be in the current working directory.
NEXUS2MultiPHYLIP Function that splits a sequential NEXUS alignment with charaset information into multiple PHYLIP-formatted alignments, one per gene/charset, and removes individuals with all missing data.
NEXUS2PHYLIP Function that reads in a single NEXUS datafile and converts it to PHYLIP ('.phy') format (Felsenstein 2002).
nQuireRunner Function that automates running nQuire software (Weiß et al. 2018) to determine sample ploidy level from next-generation sequencing (NGS) reads for one or multiple samples, starting from BAM file(s) for the sample(s).
PFSubsetSum Calculates summary statistics for DNA subsets within the optimum partitioning scheme identified for the data by PartitionFinder v1 or v2 (Lanfear et al. 2012, 2014).
phaseAlleles Automates phasing alleles of HTS data from targeted sequence capture experiments (or similar), including optionally transferring indel gaps from reference to the final phased FASTAs of consensus sequences, by masking
PHYLIP2FASTA Automates converting each of one or multiple PHYLIP DNA sequence alignments into FASTA format.
phylip2fasta.pl Nayoki Takebayashi utility Perl script for converting from PHYLIP to FASTA format.
PHYLIP2Mega Utility script for converting one or multiple PHYLIP DNA sequence alignments into Mega format.
PHYLIP2NEXUS Converts one or multiple PHYLIP-formatted multiple sequence alignments into NEXUS format, with or without pasting in a user-specified set of partitions (various formats).
PHYLIP2PFSubsets Automates construction of Y multiple sequence alignments corresponding to PartitionFinder-inferred subsets, starting from n PHYLIP, per-locus sequence alignments and a PartitionFinder results file (usually 'best_scheme.txt').
PHYLIPcleaner Function that cleans one or more PHYLIP alignments in current dir by removing individuals with all (or mostly) undetermined sites.
PHYLIPsubsampler Automates subsampling each of one to multiple PHYLIP DNA sequence alignment files down to one (random) sequence per species, e.g. for species tree analyses.
PHYLIPsummary Summarizes characteristics (numbers of characters and tip taxa) in one or multiple PHYLIP DNA sequence alignment files in current working directory, and saves to file.
PhyloMapperNullProc Script for post-processing results of a PhyloMapper null model randomization analysis.
phyNcharSumm Utility function that summarizes the number of characters in each PHYLIP DNA sequence alignment in current working directory.
pyRAD2PartitionFinder Automates running PartitionFinder (Lanfear et al. 2012, 2014) 'out-of-the-box' starting from the PHYLIP DNA sequence alignment file ('.phy') and partitions ('.partitions') file output by pyRAD (Eaton 2014) or ipyrad (Eaton and Overcast 2016).
pyRADLocusVarSites Automates summarizing the numbers of variable sites and parsimony-informative sites (PIS) within RAD/GBS loci output by the programs pyRAD or ipyrad (Eaton 2014; Eaton and Overcast 2016).
RAxMLRunChecker Utility function that counts number of loci/partitions with completed RAxML runs, during or after a run of the MAGNET pipeline within PIrANHA, and summarizes run information.
RAxMLRunner Script that automates moving and running RAxML input files on a remote supercomputing cluster (with passwordless ssh access; code for extraction of results coming in 2019??...).
renameForStarBeast2 Function that renames tip taxa (i.e. sequence names) in all PHYLIP or FASTA DNA sequence alignments in the current working directory, so that the taxon names are suitable for assigning species in BEAUti before running *BEAST or StarBEAST2 in BEAST.
renameTaxa Automates renaming all tip taxa (samples) in genetic data files of type FASTA, PHYLIP, NEXUS, or VCF (variant call format) in current working directory.
RogueNaRokRunner Function that automates reading in a Newick-formatted tree file (-i flag) and analyzing it in RogueNaRok (Aberer et al. 2013).
RYcoder New (June 2019) function that converts a PHYLIP or NEXUS DNA sequence alignment into 'RY' coding, a binary format with purines (A, G) coded as 0's and pyrimidines (C, T) recoded as 1's.
SNAPPRunner Function that automates running SNAPP (Bryant et al. 2012) on a remote supercomputing cluster (with passwordless ssh access set up by user).
SpeciesIdentifier Runs the Taxon DNA software program SpeciesIdentifier, which implements methods in the well-known Meier et al. (2006) DNA barcoding paper.
splitFASTA Automates splitting a multi-individual FASTA DNA sequence alignment into one FASTA file per sequence (tip taxon). Works with sequential FASTAs with no text wrapping across lines.
splitFile Function that splits an input file into n parts (horizontally, by row) and optionally allows the user to specify the output basename for the resulting split files.
splitPHYLIP Splits a sequential PHYLIP DNA sequence alignment into separate PHYLIP sequence alignments, one per partition (read from a user-specified partition file).
taxonCompFilter Function that loops through the multiple sequence alignments and keeps only those alignments meeting the user-specified taxonomic completeness threshold ; alignments that pass this filter are saved to an output subfolder of the current directory.
treeThinner Function that conducts downsampling ('thinning') of trees in MrBayes .t files so that they contain every nth tree.
trimSeqs Function that automates trimming one or multiple PHYLIP DNA sequence alignments using the program trimAl (Capella-Gutiérrez et al. 2009), with custom trimming options, and output to FASTA, PHYLIP, or NEXUS formats.
vcfSubsampler Utility function that uses a list file to subsample a variant call format (VCF) file so that it only contains SNPs included in the list.

It would be inconvenient to have to repeatedly refer back to this list. So, please note that this release of PIrANHA includes a list function that provides a tabulated summary of PIrANHA functions. Obtain the function list from piranha by issuing the following command from the command line:

piranha -f list

which prints:

FUNCTION			   DESCRIPTION
----------------------------------------------------------------------------------------------------------------
2logeB10.r             Rscript extracting marginal likelihood estimates and calculate 2loge B10 Bayes factors 
                       (2loge(B10)) from BEAST marginal likelihood estimation (ps / ss) runs.
alignAlleles           Aligns and cleans allele sequences (phased DNA sequences) output by the PIrANHA function
                       phaseAlleles (or in similar format; see phaseAlleles and alignAlleles usage texts for 
                       additional details)
AnouraNEXUSPrepper     In-house function for preparing NEXUS files for Anoura UCE project analyses (Calderon, 
                       Bagley, and Muchhala, in prep.).
batchRunFolders        Automates splitting a set of input files into different batches (to be run in parallel on
                       a remote supercomputing cluster, or a local machine), starting from file type or list of
                       input files.
BEAST_logThinner       Function that conducts downsampling ('thinning') of BEAST2 .log files to every nth line.
BEAST_PSPrepper        Function that automates prepping BEAST XML input files for path sampling (marginal 
                       likelihood estimation) analyses using BEAST v2+ PathSampler.
BEASTPostProc          Conducts post-processing of gene trees and species trees output by BEAST (e.g. Drummond
                       et al. 2012; Bouckaert et al. 2014; usually on a remote supercomputer).
BEASTReset             Function that resets the random seeds for n shell queue scripts corresponding to n BEAST 
                       runs/subfolders (destined for supercomputer).
BEASTRunner            Automates running BEAST XML input files on a remote supercomputing cluster.
calcAlignmentPIS       Generates and runs custom Rscript (phyloch wrapper) to calculate number of parsimony-
                       informative sites (pis) for all FASTA files in working dir.
completeConcatSeqs     Function converting series of PHYLIP (Felsenstein 2002) DNA sequence alignments (with or 
                       without varying nos. of taxa) into a single concatenated PHYLIP alignment with complete 
                       taxon sampling; also makes character subset/partition files in RAxML, PartitionFinder, 
                       and NEXUS formats for the resulting alignment.
completeSeqs           Function converting series of PHYLIP DNA sequence alignments (with or without varying nos. 
                       of taxa) into a single concatenated PHYLIP alignment with complete taxon sampling, starting 
                       from a 'taxon names and spaces' file.
concatenateSeqs        Function that converts series of PHYLIP DNA sequence alignments with equal taxon sampling 
                       into a single concatenated PHYLIP alignment.
concatSeqsPartitions   Function similar to concatenateSeqs, but which, in addition to concatenating the set of 
                       PHYLIP alignments, also outputs character subset/partitions files in RAxML, PartitionFinder,
                       and NEXUS formats. This function differs from completeConcatSeqs in only taking alignments 
                       with equal taxon sampling and in being slightly faster in this usage case.
dadiPostProc           Function for post-processing output from one or multiple ∂a∂i runs (ideally run with 
                       PIrANHA's dadiRunner function), including collation of best-fit parameter estimates, 
                       composite likelihoods, and optimal theta values.
dadiRunner             Automates running ∂a∂i on a remote supercomputing cluster. See help text (-h) and function 
                       (bin/ dir) for details.
dadiUncertainty        Automates uncertainty analysis in ∂a∂i, including generation of bootstrapped SNP files for 
                       parameter std. dev. estimation using the GIM method, as well as std. dev. estimation using 
                       the FIM method (orig. data only).
dropRandomHap          This function randomly drops one phased haplotype (allele) per individual in each of n 
                       PHYLIP gene alignments in current working directory, starting from a 'taxon names' file.
dropTaxa               Shell script automating removal of taxa from sequential, multi-individual FASTA or PHYLIP
                       DNA sequence alignments, starting from a list of taxa to remove.
ExaBayesPostProc       Function automating reading and conducting post-processing analyses on phylogenetic results 
                       output from ExaBayes.
FASTA2PHYLIP           Function that automates converting one or multiple sequential FASTA DNA sequence alignment 
                       files (with sequences either unwrapped or hard-wrapped across multiple lines) to PHYLIP 
                       format (Felsenstein 2002).
FASTA2VCF              Shell script function automating conversion of single multiple sequence FASTA alignment to 
                       variant call format (VCF) v4.1, with or without subsampling SNPs per partition/locus.
FASTAsummary           Summarizes characteristics (numbers of characters and tip taxa) in one or multiple FASTA 
                       DNA sequence alignment files in current working directory, and saves to file.
fastSTRUCTURE          Interactive function that automates running fastSTRUCTURE (Raj et al. 2014) on biallelic 
                       SNP datasets.
geneCounter            Shell script function that counts and summarizes the number of gene copies per tip species
                       in a set of gene trees in Newick format (concatenated into a single trees file), given a 
                       taxon-species assignment file.
getBootTrees           Function that automates organizing bootstrap trees output by RAxML runs conducted in 
                       current working directory using the MAGNET program within PIrANHA.
getDropTaxa            Function to create drop taxon list given lists of a) all taxa and b) a subset of taxa to
                       keep.
getTaxonNames          Utility function that extracts tip taxon names from sequences present in one or multiple 
                       PHYLIP DNA sequence alignments in current directory, using information on maximum taxon 
                       sampling level from user.
indexBAM               [In prep.]
iqtreePostProc         Function that automates post-processing of gene tree files and log files output during 
                       phylogenetic analyses in IQ-TREE v1 or v2 (Nguyen et al. 2015; Minh et al. 2020).
list                   Function that prints a tabulated list of PIrANHA functions and their descriptions.
MAGNET                 Shell pipeline for automating estimation of a maximum-likelihood (ML) gene tree in RAxML 
                       for each of many loci in a RAD-seq, UCE, or other multilocus dataset. Also contains other 
                       tools.
makePartitions         Function using PHYLIP DNA sequence alignments in current directory to make partitions/
                       charsets files in RAxML, PartitionFinder, and NEXUS formats, which are output to separate 
                       files.
Mega2PHYLIP            Automates converting one or more multiple sequence alignment files in Mega format (Mega v7+ 
                       or X; Kumar et al. 2016, 2018) to PHYLIP format (Felsenstein 2002), while saving (-k 1) or 
                       writing over (-k 0) the original Mega files.
mergeBAM               [In prep.]
MLEResultsProc         Automates post-processing of marginal likelihood estimation (MLE) results from running path 
                       sampling (ps) or stepping-stone (ss) sampling analyses on different models in BEAST.
MrBayesPostProc        Simple script for post-processing results of a MrBayes v3.2+ (Ronquist et al. 2012) run, 
                       whose output files are assumed to be in the current working directory.
NEXUS2MultiPHYLIP      Function that splits a sequential NEXUS alignment with charaset information into multiple 
                       PHYLIP-formatted alignments, one per gene/charset, and removes individuals with all missing 
                       data.
NEXUS2PHYLIP           Function that reads in a single NEXUS datafile and converts it to PHYLIP ('.phy') format 
                       (Felsenstein 2002). 
nQuireRunner           Function that automates running nQuire software (Weiß et al. 2018) to determine sample 
                       ploidy level from next-generation sequencing (NGS) reads for one or multiple samples,
                       starting from BAM file(s) for the sample(s)
PFSubsetSum            Calculates summary statistics for DNA subsets within the optimum partitioning scheme 
                       identified for the data by PartitionFinder v1 or v2 (Lanfear et al. 2012, 2014).
phaseAlleles           Automates phasing alleles of HTS data from targeted sequence capture experiments (or similar), 
                       including optionally transferring indel gaps from reference to the final phased FASTAs of 
                       consensus sequences, by masking
PHYLIP2FASTA           Automates converting each of one or multiple PHYLIP DNA sequence alignments into FASTA 
                       format.
phylip2fasta.pl        Nayoki Takebayashi utility Perl script for converting from PHYLIP to FASTA format.
PHYLIP2Mega            Utility script for converting one or multiple PHYLIP DNA sequence alignments into Mega 
                       format.
PHYLIP2NEXUS           Converts one or multiple PHYLIP-formatted multiple sequence alignments into NEXUS format, 
                       with or without pasting in a user-specified set of partitions (various formats).
PHYLIP2PFSubsets       Automates construction of Y multiple sequence alignments corresponding to PartitionFinder-
                       inferred subsets, starting from n PHYLIP, per-locus sequence alignments and a PartitionFinder 
                       results file (usually 'best_scheme.txt').
PHYLIPcleaner          Function that cleans one or more PHYLIP alignments in current dir by removing individuals 
                       with all (or mostly) undetermined sites.
PHYLIPsubsampler       Automates subsampling each of one to multiple PHYLIP DNA sequence alignment files down to 
                       one (random) sequence per species, e.g. for species tree analyses.
PHYLIPsummary          Summarizes characteristics (numbers of characters and tip taxa) in one or multiple PHYLIP 
                       DNA sequence alignment files in current working directory, and saves to file.
PhyloMapperNullProc    Script for post-processing results of a PhyloMapper null model randomization analysis.
phyNcharSumm           Utility function that summarizes the number of characters in each PHYLIP DNA sequence 
                       alignment in current working directory.
pyRAD2PartitionFinder  Automates running PartitionFinder (Lanfear et al. 2012, 2014) 'out-of-the-box' starting 
                       from the PHYLIP DNA sequence alignment file ('.phy') and partitions ('.partitions') file 
                       output by pyRAD (Eaton 2014) or ipyrad (Eaton and Overcast 2016).
pyRADLocusVarSites     Automates summarizing the numbers of variable sites and parsimony-informative sites (PIS) 
                       within RAD/GBS loci output by the programs pyRAD or ipyrad (Eaton 2014; Eaton and Overcast 
                       2016).
RAxMLRunChecker        Utility function that counts number of loci/partitions with completed RAxML runs, during 
                       or after a run of the MAGNET pipeline within PIrANHA, and summarizes run information.
RAxMLRunner            Script that automates moving and running RAxML input files on a remote supercomputing 
                       cluster (with passwordless ssh access; code for extraction of results coming in 2019??...).
renameForStarBeast2    Function that renames tip taxa (i.e. sequence names) in all PHYLIP or FASTA DNA sequence 
                       alignments in the current working directory, so that the taxon names are suitable for 
                       assigning species in BEAUti before running *BEAST or StarBEAST2 in BEAST.
renameTaxa             Automates renaming all tip taxa (samples) in genetic data files of type FASTA, PHYLIP, 
                       NEXUS, or VCF (variant call format) in current working directory.
RogueNaRokRunner       Function that automates reading in a Newick-formatted tree file (-i flag) and analyzing it
                       in RogueNaRok (Aberer et al. 2013).
RYcoder                New (June 2019) function that converts a PHYLIP or NEXUS DNA sequence alignment into 'RY' 
                       coding, a binary format with purines (A, G) coded as 0's and pyrimidines (C, T) recoded
                       as 1's.
SNAPPRunner            Function that automates running SNAPP (Bryant et al. 2012) on a remote supercomputing 
                       cluster (with passwordless ssh access set up by user).
SpeciesIdentifier      Runs the Taxon DNA software program SpeciesIdentifier, which implements methods in the well-
                       known Meier et al. (2006) DNA barcoding paper.
splitFASTA             Automates splitting a multi-individual FASTA DNA sequence alignment into one FASTA file per
                       sequence (tip taxon). Works with sequential FASTAs with no text wrapping across lines.
splitFile              Function that splits an input file into n parts (horizontally, by row) and optionally allows 
                       the user to specify the output basename for the resulting split files.
splitPHYLIP            Splits a sequential PHYLIP DNA sequence alignment into separate PHYLIP sequence alignments,
                       one per partition (read from a user-specified partition file).
taxonCompFilter        Function that loops through the multiple sequence alignments and keeps only those alignments 
                       meeting the user-specified taxonomic completeness threshold <taxCompThresh>; alignments that 
                       pass this filter are saved to an output subfolder of the current directory.
treeThinner            Function that conducts downsampling ('thinning') of trees in MrBayes .t files so that they 
                       contain every nth tree.
trimSeqs               Function that automates trimming one or multiple PHYLIP DNA sequence alignments using the 
                       program trimAl (Capella-Gutiérrez et al. 2009), with custom trimming options, and output to 
                       FASTA, PHYLIP, or NEXUS formats.
vcfSubsampler          Utility function that uses a list file to subsample a variant call format (VCF) file so that 
                       it only contains SNPs included in the list.

REFERENCES
Aberer, A., Krompass, D., Stamatakis, A. 2013. Pruning rogue taxa improves phylogenetic 
	accuracy: an efficient algorithm and webservice. Systematic Biology 62(1), 162–166.
Bouckaert, R., Heled, J., Künert, D., Vaughan, T.G., Wu, C.H., Xie, D., Suchard, M.A., 
	Rambaut, A., Drummond, A.J. 2014. BEAST2: a software platform for Bayesian evolutionary 
	analysis. PLoS Computational Biology 10, e1003537.
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A., RoyChoudhury, A. 2012. Inferring 
	species trees directly from biallelic genetic markers: bypassing gene trees in a full 
	coalescent analysis. Molecular Biology and Evolution 29, 1917–1932.
Capella-Gutiérrez, S., Silla-Martínez, J.M., Gabaldon, T., 2009. TRIMAL: a tool for automated
	alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973.
Drummond, A.J., Suchard, M.A., Xie, D., Rambaut, A. 2012. Bayesian phylogenetics with BEAUti 
 	and the BEAST 1.7. Molecular Biology and Evolution 29, 1969-1973.
Eaton, D.A. 2014. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. 
 	Bioinformatics 30, 1844-1849.
Eaton, D.A.R., Overcast, I. 2016. ipyrad: interactive assembly and analysis of RADseq data sets. 
 	Available at: <http://ipyrad.readthedocs.io/>.
Felsenstein, J. 2002. PHYLIP (Phylogeny Inference Package) Version 3.6 a3.
	Available at: <http://evolution.genetics.washington.edu/phylip.html>.
Lanfear, R., Calcott, B., Ho, S.Y.W., Guindon, S. 2012. Partitionfinder: combined selection of 
	partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology 
	and Evolution 29, 1695–1701. 
Lanfear, R., Calcott, B., Kainer, D., Mayer, C., Stamatakis, A. 2014. Selecting optimal 
	partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology 14, 82.
Meier, R., Shiyang, K., Vaidya, G., Ng, P.K. 2006. DNA barcoding and taxonomy in Diptera: 
	a tale of high intraspecific variability and low identification success. Systematic 
	Biology 55(5), 715-728.
Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D., Von Haeseler, A., 
	Lanfear, R., 2020. IQ-TREE 2: New models and efficient methods for phylogenetic inference 
	in the genomic era. Molecular Biology and Evolution 37(5), 1530-1534.
Nguyen, L.T., Schmidt, H.A., Von Haeseler, A., Minh, B.Q., 2015. IQ-TREE: a fast and effective
	stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and 
	Evolution 32(1), 268-274.
Weiß, C.L., Pais, M., Cano, L.M., Kamoun, S., Burbano, H.A. 2018. nQuire: a statistical 
	framework for ploidy estimation using next generation sequencing. BMC Bioinformatics 
	19(1), 122.

----------------------------------------------------------------------------------------------------------------

Passwordless SSH Access

Part of what PIrANHA does focuses on allowing users with access to a remote supercomputing cluster to take advantage of that resource in an automated fashion. Thus, it is implicitly assumed when running a handful of PIrANHA functions that the user has set up passowordless ssh access to a supercomputer account.

✋ If you have not done this, or are unsure about this, then you should set up passwordless acces by creating and organizing appropriate and secure public and private ssh keys on your machine and the remote supercomputer prior to using PIrANHA. By "secure," I mean that, during this process, you should have closed write privledges to authorized keys by typing "chmod u-w authorized keys" after setting things up using ssh-keygen.

❗ Setting up passwordless SSH access is VERY IMPORTANT for running the BEASTRunner, RAxMLRunner, and SNAPPRunner functions of PIrANHA, which run pipelines that will not work if passwordless ssh is not set up. The following links provide a list of useful tutorials/discussions that can help users set up passwordless SSH access:

Input and Output File Formats

📄 PIrANHA functions accept a number of different input file types, which are listed in Table 1 below. These can be generated by hand or are output by specific upstream software programs. As far as output file types go, PIrANHA outputs various text, PDF, and other kinds of graphical output from software that are linked through PIrANHA pipelines.

Input file types Software (from)
.partitions pyRAD / ipyrad
.phy pyRAD / ipyrad / by hand
.str pyRAD / ipyrad
.gphocs pyRAD / ipyrad / MAGNET (NEXUS2gphocs.sh)
.loci pyRAD / ipyrad
.nex pyRAD / ipyrad / by hand
.trees BEAST
.species.trees BEAST
.log BEAST
.mle.log BEAST
.xml BEAUti
.sfs easySFS
Exabayes_topologies.* ExaBayes
Exabayes_parameters.* ExaBayes

December 26, 2020 - Justin C. Bagley, Jacksonville, AL, USA

<< Previous (Distribution Structure) | Next (Workflows) >>