Skip to content

MLCA Genotyping

Robert J. Gifford edited this page Nov 27, 2024 · 1 revision

This page details the methodology applied to define genogroups and genotypes within BTV-GLUE, a GLUE project focused on the comparative genomics of bluetongue virus (BTV).

1. Data Collection and Initial Sequence Processing


  • Sequence Collation: All BTV sequences were collated from GenBank as of the date 30-06-2018.

  • Exclusion of Sequences: Certain sequences were manually excluded from the analysis based on specific criteria. The accession numbers of these sequences and the reasons for their exclusion are documented in the following file:

    • BTV-GLUE/tabular/formatted/sequences_to_exclude.txt

2. Segment Assignment and Reference Selection


  • Segment Assignment Procedure:

    • A combination of GenBank annotations and the BLAST-based btvSegmentRecogniser module was used to assign each sequence to its corresponding segment.
  • Master Reference Sequences: The master reference sequences selected for each segment are as follows:

    • Segment 1: JX680457
    • Segment 2: JX680458
    • Segment 3: JX680459
    • Segment 4: JX680460
    • Segment 5: JX680461
    • Segment 6: JX680462
    • Segment 7: JX680463
    • Segment 8: JX680464
    • Segment 9: JX680465
    • Segment 10: JX680466
  • Outgroup Sequences:

    • Epizootic Hemorrhagic Disease Virus (EHDV): AM744977-AM744986
    • Palyam Virus (PATAV): JQ070386-JQ070395

3. Sequence Length Criteria for Segment Inclusion


  • Segments 1-9: Sequences must have a length of at least 90% of the segment master reference length to ensure the coding region is adequately captured.

  • Segment 10: Due to the smaller proportion of the coding region, sequences need to be at least 80% of the segment master reference length.

4. Alignment Construction


  • Nucleotide Alignment:

    • A nucleotide alignment (BTV_COMPL_SEG_NT) was constructed using MAFFT v7.299b (default settings) for each segment, incorporating the master reference sequence along with all non-excluded GenBank sequences.
  • Protein Translation and Outgroup Integration:

    • Each sequence in the nucleotide alignment was translated into an unaligned protein sequence of its coding region.
    • Protein translations of the relevant outgroup sequences were combined and aligned using MAFFT v7.299b.
  • Reverse Translation to Nucleotide Alignment:

    • The aligned protein sequences were reverse-translated back to nucleotide sequences using TBLASTN in GLUE's blastFastaAlignmentImporter pattern.
    • This produced BTV_OUTG_CODON alignments (outgroup codon alignments) that were inspected and manually curated.

5. Phylogenetic Tree Construction


  • Building Phylogenetic Trees:

    • Phylogenetic trees were generated from these nucleotide alignments using RAxML-NG v0.8.1 with the GTR+G+I model and 500 bootstrap replicates.
    • The Transfer Bootstrap Estimate (TBE) method was employed for node support calculation (refer to https://doi.org/10.1038/s41586-018-0043-0).
  • Tree Rerooting and Outgroup Usage:

    • Outgroups were generally used for tree rerooting, after which they were removed, except for segment 6, where the EHDV sequence was retained.
    • For segments 5, 8, 9, and 10, midpoint rooting was applied due to the unrealistic placement of outgroup rooting.

6. Genogroup and Genotype Demarcation


  • ClusterPicker Settings:

    • ClusterPicker 1.2.5 was used to define genogroups and genotypes based on the following parameters:
      • Transfer Bootstrap Threshold: 75%
      • Genetic Distance Strategy: "gap"
  • Genetic Distance Thresholds:

    • Segment 1 Genogroup: p = 0.2
    • Segment 2 Genogroup: p = 0.35
    • Segment 2 Genotype: p = 0.22
    • Segment 3 Genogroup: p = 0.2
    • Segment 4 Genogroup: p = 0.2
    • Segment 5 Genogroup: p = 0.2
    • Segment 6 Genogroup: p = 0.33
    • Segment 7 Genogroup: p = 0.2
    • Segment 8 Genogroup: p = 0.2
    • Segment 9 Genogroup: p = 0.2
    • Segment 10 Genogroup: p = 0.2
  • These thresholds were chosen to reflect groupings from the literature, ensuring that previously characterized serotypes align with one or two genotypes for segment 2.

7. Reference Sequence Selection


  • Manual Selection of Reference Sequences:
    • References were manually chosen by Kiki to capture the genetic diversity within each genogroup and genotype.
  • Reference Sequence Files:
    • Detailed lists of reference sequences for each segment can be found in:
      • BTV-GLUE/tabular/formatted/Segment<X>RefList.txt (where <X> is the segment number).

8. Segment Reference Phylogenies


  • Genotyping Codon Alignments:

    • Genotyping codon alignments were created by retaining only the reference sequences from the outgroup codon alignments for each segment.
  • Tree Generation for Segment References:

    • RAxML 8.2.8 was used with the GTRGAMMAI model and 1000 bootstrap replicates to generate the segment reference phylogenies.
  • Rooting of Segment Reference Phylogenies:

    • The trees were rooted at internal nodes that separate specific genogroups, for example:
      • Segment 1: Rooted at ["C", "D", "E"]
      • Segment 2: Rooted at ["G", "J"]
      • Segment 3: Rooted at ["C", "D", "E"]
      • (Additional details for each segment as provided in the original text)

9. Automated Typing Process


  • Automated Genotype Assignment:
    • The genotyping codon alignments and segment reference phylogenies are the basis of the automated typing process available via the BTV-GLUE website.
  • Method Employed:
    • This process uses the Maximum Likelihood Clade Assignment method described in the GLUE Bioinformatics paper.
  • Configuration Parameters:
    • Distance Scaling Exponent: -3.0
    • Distance Cutoff: Equal to the p-distance threshold set earlier for each segment.
    • Internal Distance Cutoff: Twice the distance cutoff value.
    • Clade Category Cutoff: 80%