-
Notifications
You must be signed in to change notification settings - Fork 4
Module: SeedSequence
Niema Moshiri edited this page Aug 15, 2018
·
35 revisions
The SeedSequence module generates initial infection sequence(s) and infection time(s) for a given seed node. See the source code to see what is defined by the abstract class.
- A migration coalescent tree is simulated where leaves are seed individuals labeled by their community, coalescence can only occur between individuals of the same community, and migration occurs across the communities at a user-specified rate
- The result is a single phylogenetic history for the seed sequences, but such that there is preference for coalescence between seeds in the same community (but coalescence across communities is not disallowed)
- Then, a single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit and is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node - Requirements:
- DendroPy
- HMMER
- msms
- Seq-Gen
- Must use a ContactNetworkGenerator module that creates communities
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
msms_path
: The path to yourmsms
executable (or simply"msms"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
community_seed_populations
: The desired (constant) population size for each community's coalescent process (a list of positive integers)- See the msms manual for more information
-
community_seed_migration_rates
: The desired migration rates across communities, represented as a 2-dimensional dictionary where keys are integers representing the communities (0-based), e.g.:{ 0: { 1: 0.3, # migration rate from 0 to 1 2: 0.5, # migration rate from 0 to 2 ... }, 1: { 0: 0.1, # migration rate from 1 to 0 2: 0.6, # migration rate from 1 to 2 ... }, ... }
- See the msms manual for more information
-
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_gamma_shape
: The desired shape parameter of the Gamma distribution for site rate heterogeneity -
seqgen_num_gamma_rate_categories
: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or""
for the default value, which is 0)
-
- The seeds in each community are coalesced via pure neutral Kingman Coalescence, and then the roots of the communities' trees are coalesced again
- The result is a single phylogenetic history for the seed sequences, but with the restriction that all seeds in a given community are in the same clade
- Then, a single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit and is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node - Requirements:
- DendroPy
- HMMER
- Seq-Gen
- Must use a ContactNetworkGenerator module that creates communities
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
community_root_population
: The desired (constant) population size of the coalescent process of the roots of the communities' seed trees -
community_seed_populations
: The desired (constant) population size for each community's coalescent process (a list of floats) -
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_gamma_shape
: The desired shape parameter of the Gamma distribution for site rate heterogeneity -
seqgen_num_gamma_rate_categories
: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or""
for the default value, which is 0)
-
- A profile HMM is created from a user-specified multiple sequence alignment using
hmmbuild
, and seed sequences are randomly generated from this profile HMM using thehmmemit
tool in the HMMER toolkit - Requirements:
- Config Parameters:
-
hmmbuild_path
: The path to yourhmmbuild
executable (or simply"hmmbuild"
if it is in yourPATH
variable) -
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
hmmbuild_msafile
: File containing multiple sequence alignment from which to create a profile HMM -
hmmbuild_options
: The command-line options with which to runhmmbuild_options
(just the options, nothmmfile_out
normsafile
) -
hmmemit_options
: The command-line options with which to runhmmemit
(just the options, nothmmfile
: you will specifyhmmfile
in thehmmemit_hmmfile
config parameter)- Do not use the
-o
argument, as we use standard output to parse thehmmemit
output
- Do not use the
-
- Seed sequences are randomly generated from a profile HMM using the
hmmemit
tool in the HMMER toolkit - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
hmmemit_hmmfile
: File containing profile HMM(s) from which to generate seed sequences- If a file is specified containing multiple profile HMMs, a sequence will be generated from each and randomly selected with a uniform distribution
-
hmmemit_options
: The command-line options with which to runhmmemit
(just the options, nothmmfile
: you will specifyhmmfile
in thehmmemit_hmmfile
config parameter)- Do not use the
-o
argument, as we use standard output to parse thehmmemit
output
- Do not use the
-
- No sequences are generated
- Requirements:
- Must use SequenceEvolution_NoSeqs module
- Must use Sequencing_NoSeqs module
- Config Parameters:
- None
- Wrapper for PANGEA.HIV.sim
- This is not supported in the Docker/Singularity images from FAVITES 1.1.11 onward
- Requirements:
- R
- PANGEA.HIV.sim
- Must use ContactNetwork_PANGEA module
- Must use ContactNetworkGenerator_PANGEA module
- Must use EndCriteria_Instant module
- Must use NodeEvolution_PANGEA module
- Must use NodeSample_PANGEA module
- Must use NumBranchSample_All module
- Must use NumTimeSample_PANGEA module
- Must use PostValidation_Dummy module
- Must use SeedSelection_PANGEA module
- Must use SeedSequence_PANGEA module
- Must use SequenceEvolution_PANGEA module
- Must use SourceSample_PANGEA module
- Must use TimeSample_PANGEA module
- Must use TransmissionNodeSample_PANGEA module
- Must use TransmissionTimeSample_PANGEA module
- Must use TreeUnit_Same module
- Config Parameters
-
Rscript_path
: The path to yourRscript
executable (or simply"Rscript"
if it is in yourPATH
variable) - All
pangea_
parameters, which correspond to PANGEA.HIV.sim parameters (see entry in FAVITES_ModuleList.json for complete list, and see PANGEA.HIV.sim help for details)- Use
""
for default
- Use
-
- Seed sequences are randomly generated from the DNA alphabet with equal probability for each nucleotide
- Requirements:
- None
- Config Parameters:
-
seed_sequence_length
: The desired length of the seed sequences
-
- Seed sequences are randomly generated from the 61 non-STOP codons with equal probability for each codon
- Requirements:
- None
- Config Parameters:
-
seed_sequence_codon_length
: The desired number of codons in the seed sequences (i.e., seeds will have 3 times this length)
-
- Seed sequences are randomly generated from the DNA alphabet with user-specified nucleotide frequencies
- Users specify Pr(A), Pr(C), and Pr(G), and Pr(T) = 1 - Pr(A) - Pr(C) - Pr(G)
- Requirements:
- None
- Config Parameters:
-
seed_sequence_length
: The desired length of the seed sequences -
seed_prob_A
: Probability of A -
seed_prob_C
: Probability of C -
seed_prob_G
: Probability of G
-
- Seed sequences specified by the user
- Requirements:
- None
- Config Parameters:
-
num_seeds
: The desired number of seed nodes -
seed_seqs
: A list containing the seed sequences (seed_seqs
must contain exactlynum_seeds
sequences)
-
- Seed sequences are randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
- A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit, a Birth-Death tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Codon model to generate a seed sequence for each seed node - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
seed_birth_rate
: The desired birth rate of the simulated Birth-Death tree -
seed_death_rate
: The desired death rate of the simulated Birth-Death tree -
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_codon_site1_rate
: The desired rate of heterogeneity for site 1 of a codon -
seqgen_codon_site2_rate
: The desired rate of heterogeneity for site 2 of a codon -
seqgen_codon_site3_rate
: The desired rate of heterogeneity for site 3 of a codon
-
- A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit, a Birth-Death tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
seed_birth_rate
: The desired birth rate of the simulated Birth-Death tree -
seed_death_rate
: The desired death rate of the simulated Birth-Death tree -
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_gamma_shape
: The desired shape parameter of the Gamma distribution for site rate heterogeneity -
seqgen_num_gamma_rate_categories
: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or""
for the default value, which is 0)
-
- A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit, the a Kingman Coalescent tree with expected coalescent times and one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Codon model to generate a seed sequence for each seed node - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
seed_population
: The desired (constant) population size of the simulated coalescent tree -
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_codon_site1_rate
: The desired rate of heterogeneity for site 1 of a codon -
seqgen_codon_site2_rate
: The desired rate of heterogeneity for site 2 of a codon -
seqgen_codon_site3_rate
: The desired rate of heterogeneity for site 3 of a codon
-
- A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit, a Kingman Coalescent tree with expected coalescent times and one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
seed_population
: The desired (constant) population size of the simulated coalescent tree -
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_gamma_shape
: The desired shape parameter of the Gamma distribution for site rate heterogeneity -
seqgen_num_gamma_rate_categories
: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or""
for the default value, which is 0)
-
- A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit, a non-homogeneous Yule tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
seed_height
: The desired height of the simulated non-homogeneous Yule tree -
seed_speciation_rate_func
: The desired speciation rate function of the simulated non-homogeneous Yule tree- E.g.
"x**2 + 5"
- E.g.
-
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_gamma_shape
: The desired shape parameter of the Gamma distribution for site rate heterogeneity -
seqgen_num_gamma_rate_categories
: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or""
for the default value, which is 0)
-
- A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit, the a pure neutral Kingman Coalescent tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Codon model to generate a seed sequence for each seed node - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
seed_population
: The desired (constant) population size of the simulated coalescent tree -
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_codon_site1_rate
: The desired rate of heterogeneity for site 1 of a codon -
seqgen_codon_site2_rate
: The desired rate of heterogeneity for site 2 of a codon -
seqgen_codon_site3_rate
: The desired rate of heterogeneity for site 3 of a codon
-
- A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit, a pure neutral Kingman Coalescent tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
seed_population
: The desired (constant) population size of the simulated coalescent tree -
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_gamma_shape
: The desired shape parameter of the Gamma distribution for site rate heterogeneity -
seqgen_num_gamma_rate_categories
: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or""
for the default value, which is 0)
-
- A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the
hmmemit
tool in the HMMER toolkit, a Yule tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node - Requirements:
- Config Parameters:
-
hmmemit_path
: The path to yourhmmemit
executable (or simply"hmmemit"
if it is in yourPATH
variable) -
seqgen_path
: The path to yourseq-gen
executable (or simply"seq-gen"
if it is in yourPATH
variable) -
viral_sequence_type
: The desired type of viral sequence to generate- See the
URL
variable at the beginning of the SeedSequence_Virus source code for valid options
- See the
-
seed_height
: The desired height of the simulated Yule tree -
seqgen_a_to_c
: The desired transition rate from A to C (and C to A) -
seqgen_a_to_g
: The desired transition rate from A to G (and G to A) -
seqgen_a_to_t
: The desired transition rate from A to T (and T to A) -
seqgen_c_to_g
: The desired transition rate from C to G (and G to C) -
seqgen_c_to_t
: The desired transition rate from C to T (and T to C) -
seqgen_g_to_t
: The desired transition rate from G to T (and T to G) -
seqgen_freq_a
: The desired frequency of A -
seqgen_freq_c
: The desired frequency of C -
seqgen_freq_g
: The desired frequency of G -
seqgen_freq_t
: The desired frequency of T -
seqgen_gamma_shape
: The desired shape parameter of the Gamma distribution for site rate heterogeneity -
seqgen_num_gamma_rate_categories
: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or""
for the default value, which is 0)
-
Niema Moshiri & Siavash Mirarab 2016