Skip to content

Module: SeedSequence

Niema Moshiri edited this page Aug 15, 2018 · 35 revisions

The SeedSequence module generates initial infection sequence(s) and infection time(s) for a given seed node. See the source code to see what is defined by the abstract class.

List of Implementations

  • A migration coalescent tree is simulated where leaves are seed individuals labeled by their community, coalescence can only occur between individuals of the same community, and migration occurs across the communities at a user-specified rate
    • The result is a single phylogenetic history for the seed sequences, but such that there is preference for coalescence between seeds in the same community (but coalescence across communities is not disallowed)
  • Then, a single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit and is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • msms_path: The path to your msms executable (or simply "msms" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • community_seed_populations: The desired (constant) population size for each community's coalescent process (a list of positive integers)
    • community_seed_migration_rates: The desired migration rates across communities, represented as a 2-dimensional dictionary where keys are integers representing the communities (0-based), e.g.:
      {
          0: {
                 1: 0.3, # migration rate from 0 to 1
                 2: 0.5, # migration rate from 0 to 2
                 ...
             },
          1: {
                 0: 0.1, # migration rate from 1 to 0
                 2: 0.6, # migration rate from 1 to 2
                 ...
             },
             ...
      }
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_gamma_shape: The desired shape parameter of the Gamma distribution for site rate heterogeneity
    • seqgen_num_gamma_rate_categories: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or "" for the default value, which is 0)
  • The seeds in each community are coalesced via pure neutral Kingman Coalescence, and then the roots of the communities' trees are coalesced again
    • The result is a single phylogenetic history for the seed sequences, but with the restriction that all seeds in a given community are in the same clade
  • Then, a single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit and is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • community_root_population : The desired (constant) population size of the coalescent process of the roots of the communities' seed trees
    • community_seed_populations: The desired (constant) population size for each community's coalescent process (a list of floats)
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_gamma_shape: The desired shape parameter of the Gamma distribution for site rate heterogeneity
    • seqgen_num_gamma_rate_categories: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or "" for the default value, which is 0)
  • A profile HMM is created from a user-specified multiple sequence alignment using hmmbuild, and seed sequences are randomly generated from this profile HMM using the hmmemit tool in the HMMER toolkit
  • Requirements:
  • Config Parameters:
    • hmmbuild_path: The path to your hmmbuild executable (or simply "hmmbuild" if it is in your PATH variable)
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • hmmbuild_msafile: File containing multiple sequence alignment from which to create a profile HMM
    • hmmbuild_options: The command-line options with which to run hmmbuild_options (just the options, not hmmfile_out nor msafile)
    • hmmemit_options: The command-line options with which to run hmmemit (just the options, not hmmfile: you will specify hmmfile in the hmmemit_hmmfile config parameter)
      • Do not use the -o argument, as we use standard output to parse the hmmemit output
  • Seed sequences are randomly generated from a profile HMM using the hmmemit tool in the HMMER toolkit
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • hmmemit_hmmfile: File containing profile HMM(s) from which to generate seed sequences
      • If a file is specified containing multiple profile HMMs, a sequence will be generated from each and randomly selected with a uniform distribution
    • hmmemit_options: The command-line options with which to run hmmemit (just the options, not hmmfile: you will specify hmmfile in the hmmemit_hmmfile config parameter)
      • Do not use the -o argument, as we use standard output to parse the hmmemit output
  • Seed sequences are randomly generated from the DNA alphabet with equal probability for each nucleotide
  • Requirements:
    • None
  • Config Parameters:
    • seed_sequence_length: The desired length of the seed sequences
  • Seed sequences are randomly generated from the 61 non-STOP codons with equal probability for each codon
  • Requirements:
    • None
  • Config Parameters:
    • seed_sequence_codon_length : The desired number of codons in the seed sequences (i.e., seeds will have 3 times this length)
  • Seed sequences are randomly generated from the DNA alphabet with user-specified nucleotide frequencies
  • Users specify Pr(A), Pr(C), and Pr(G), and Pr(T) = 1 - Pr(A) - Pr(C) - Pr(G)
  • Requirements:
    • None
  • Config Parameters:
    • seed_sequence_length: The desired length of the seed sequences
    • seed_prob_A: Probability of A
    • seed_prob_C: Probability of C
    • seed_prob_G: Probability of G
  • Seed sequences specified by the user
  • Requirements:
    • None
  • Config Parameters:
    • num_seeds: The desired number of seed nodes
    • seed_seqs: A list containing the seed sequences (seed_seqs must contain exactly num_seeds sequences)
  • Seed sequences are randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
  • A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit, a Birth-Death tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Codon model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • seed_birth_rate: The desired birth rate of the simulated Birth-Death tree
    • seed_death_rate: The desired death rate of the simulated Birth-Death tree
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_codon_site1_rate: The desired rate of heterogeneity for site 1 of a codon
    • seqgen_codon_site2_rate: The desired rate of heterogeneity for site 2 of a codon
    • seqgen_codon_site3_rate: The desired rate of heterogeneity for site 3 of a codon
  • A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit, a Birth-Death tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • seed_birth_rate: The desired birth rate of the simulated Birth-Death tree
    • seed_death_rate: The desired death rate of the simulated Birth-Death tree
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_gamma_shape: The desired shape parameter of the Gamma distribution for site rate heterogeneity
    • seqgen_num_gamma_rate_categories: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or "" for the default value, which is 0)
  • A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit, the a Kingman Coalescent tree with expected coalescent times and one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Codon model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • seed_population: The desired (constant) population size of the simulated coalescent tree
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_codon_site1_rate: The desired rate of heterogeneity for site 1 of a codon
    • seqgen_codon_site2_rate: The desired rate of heterogeneity for site 2 of a codon
    • seqgen_codon_site3_rate: The desired rate of heterogeneity for site 3 of a codon
  • A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit, a Kingman Coalescent tree with expected coalescent times and one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • seed_population: The desired (constant) population size of the simulated coalescent tree
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_gamma_shape: The desired shape parameter of the Gamma distribution for site rate heterogeneity
    • seqgen_num_gamma_rate_categories: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or "" for the default value, which is 0)
  • A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit, a non-homogeneous Yule tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • seed_height: The desired height of the simulated non-homogeneous Yule tree
    • seed_speciation_rate_func: The desired speciation rate function of the simulated non-homogeneous Yule tree
      • E.g. "x**2 + 5"
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_gamma_shape: The desired shape parameter of the Gamma distribution for site rate heterogeneity
    • seqgen_num_gamma_rate_categories: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or "" for the default value, which is 0)
  • A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit, the a pure neutral Kingman Coalescent tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Codon model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • seed_population: The desired (constant) population size of the simulated coalescent tree
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_codon_site1_rate: The desired rate of heterogeneity for site 1 of a codon
    • seqgen_codon_site2_rate: The desired rate of heterogeneity for site 2 of a codon
    • seqgen_codon_site3_rate: The desired rate of heterogeneity for site 3 of a codon
  • A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit, a pure neutral Kingman Coalescent tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • seed_population: The desired (constant) population size of the simulated coalescent tree
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_gamma_shape: The desired shape parameter of the Gamma distribution for site rate heterogeneity
    • seqgen_num_gamma_rate_categories: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or "" for the default value, which is 0)
  • A single root sequence is randomly generated from a profile HMM prebuilt for a user-specified virus using the hmmemit tool in the HMMER toolkit, a Yule tree with one leaf per seed node is simulated, and the root sequence that was generated is evolved down the tree under a GTR Gamma model to generate a seed sequence for each seed node
  • Requirements:
  • Config Parameters:
    • hmmemit_path: The path to your hmmemit executable (or simply "hmmemit" if it is in your PATH variable)
    • seqgen_path: The path to your seq-gen executable (or simply "seq-gen" if it is in your PATH variable)
    • viral_sequence_type: The desired type of viral sequence to generate
    • seed_height: The desired height of the simulated Yule tree
    • seqgen_a_to_c: The desired transition rate from A to C (and C to A)
    • seqgen_a_to_g: The desired transition rate from A to G (and G to A)
    • seqgen_a_to_t: The desired transition rate from A to T (and T to A)
    • seqgen_c_to_g: The desired transition rate from C to G (and G to C)
    • seqgen_c_to_t: The desired transition rate from C to T (and T to C)
    • seqgen_g_to_t: The desired transition rate from G to T (and T to G)
    • seqgen_freq_a: The desired frequency of A
    • seqgen_freq_c: The desired frequency of C
    • seqgen_freq_g: The desired frequency of G
    • seqgen_freq_t: The desired frequency of T
    • seqgen_gamma_shape: The desired shape parameter of the Gamma distribution for site rate heterogeneity
    • seqgen_num_gamma_rate_categories: The desired number of rate categories of the Gamma distribution for site rate heterogeneity (or "" for the default value, which is 0)
Clone this wiki locally