MLCA Genotyping

This page details the methodology applied to define genogroups and genotypes within BTV-GLUE, a GLUE project focused on the comparative genomics of bluetongue virus (BTV).

1. Data Collection and Initial Sequence Processing

Sequence Collation: All BTV sequences were collated from GenBank as of the date 30-06-2018.
Exclusion of Sequences: Certain sequences were manually excluded from the analysis based on specific criteria. The accession numbers of these sequences and the reasons for their exclusion are documented in the following file:
- BTV-GLUE/tabular/formatted/sequences_to_exclude.txt

2. Segment Assignment and Reference Selection

Segment Assignment Procedure:
- A combination of GenBank annotations and the BLAST-based btvSegmentRecogniser module was used to assign each sequence to its corresponding segment.
Master Reference Sequences: The master reference sequences selected for each segment are as follows:
- Segment 1: JX680457
- Segment 2: JX680458
- Segment 3: JX680459
- Segment 4: JX680460
- Segment 5: JX680461
- Segment 6: JX680462
- Segment 7: JX680463
- Segment 8: JX680464
- Segment 9: JX680465
- Segment 10: JX680466
Outgroup Sequences:
- Epizootic Hemorrhagic Disease Virus (EHDV): AM744977-AM744986
- Palyam Virus (PATAV): JQ070386-JQ070395

3. Sequence Length Criteria for Segment Inclusion

Segments 1-9: Sequences must have a length of at least 90% of the segment master reference length to ensure the coding region is adequately captured.
Segment 10: Due to the smaller proportion of the coding region, sequences need to be at least 80% of the segment master reference length.

4. Alignment Construction

Nucleotide Alignment:
- A nucleotide alignment (BTV_COMPL_SEG_NT) was constructed using MAFFT v7.299b (default settings) for each segment, incorporating the master reference sequence along with all non-excluded GenBank sequences.
Protein Translation and Outgroup Integration:
- Each sequence in the nucleotide alignment was translated into an unaligned protein sequence of its coding region.
- Protein translations of the relevant outgroup sequences were combined and aligned using MAFFT v7.299b.
Reverse Translation to Nucleotide Alignment:
- The aligned protein sequences were reverse-translated back to nucleotide sequences using TBLASTN in GLUE's blastFastaAlignmentImporter pattern.
- This produced BTV_OUTG_CODON alignments (outgroup codon alignments) that were inspected and manually curated.

5. Phylogenetic Tree Construction

Building Phylogenetic Trees:
- Phylogenetic trees were generated from these nucleotide alignments using RAxML-NG v0.8.1 with the GTR+G+I model and 500 bootstrap replicates.
- The Transfer Bootstrap Estimate (TBE) method was employed for node support calculation (refer to https://doi.org/10.1038/s41586-018-0043-0).
Tree Rerooting and Outgroup Usage:
- Outgroups were generally used for tree rerooting, after which they were removed, except for segment 6, where the EHDV sequence was retained.
- For segments 5, 8, 9, and 10, midpoint rooting was applied due to the unrealistic placement of outgroup rooting.

6. Genogroup and Genotype Demarcation

ClusterPicker Settings:
- ClusterPicker 1.2.5 was used to define genogroups and genotypes based on the following parameters:
  - Transfer Bootstrap Threshold: 75%
  - Genetic Distance Strategy: "gap"
Genetic Distance Thresholds:
- Segment 1 Genogroup: p = 0.2
- Segment 2 Genogroup: p = 0.35
- Segment 2 Genotype: p = 0.22
- Segment 3 Genogroup: p = 0.2
- Segment 4 Genogroup: p = 0.2
- Segment 5 Genogroup: p = 0.2
- Segment 6 Genogroup: p = 0.33
- Segment 7 Genogroup: p = 0.2
- Segment 8 Genogroup: p = 0.2
- Segment 9 Genogroup: p = 0.2
- Segment 10 Genogroup: p = 0.2
These thresholds were chosen to reflect groupings from the literature, ensuring that previously characterized serotypes align with one or two genotypes for segment 2.

7. Reference Sequence Selection

Manual Selection of Reference Sequences:
- References were manually chosen by Kiki to capture the genetic diversity within each genogroup and genotype.
Reference Sequence Files:
- Detailed lists of reference sequences for each segment can be found in:
  - BTV-GLUE/tabular/formatted/Segment<X>RefList.txt (where <X> is the segment number).

8. Segment Reference Phylogenies

Genotyping Codon Alignments:
- Genotyping codon alignments were created by retaining only the reference sequences from the outgroup codon alignments for each segment.
Tree Generation for Segment References:
- RAxML 8.2.8 was used with the GTRGAMMAI model and 1000 bootstrap replicates to generate the segment reference phylogenies.
Rooting of Segment Reference Phylogenies:
- The trees were rooted at internal nodes that separate specific genogroups, for example:
  - Segment 1: Rooted at ["C", "D", "E"]
  - Segment 2: Rooted at ["G", "J"]
  - Segment 3: Rooted at ["C", "D", "E"]
  - (Additional details for each segment as provided in the original text)

9. Automated Typing Process

Automated Genotype Assignment:
- The genotyping codon alignments and segment reference phylogenies are the basis of the automated typing process available via the BTV-GLUE website.
Method Employed:
- This process uses the Maximum Likelihood Clade Assignment method described in the GLUE Bioinformatics paper.
Configuration Parameters:
- Distance Scaling Exponent: -3.0
- Distance Cutoff: Equal to the p-distance threshold set earlier for each segment.
- Internal Distance Cutoff: Twice the distance cutoff value.
- Clade Category Cutoff: 80%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLCA Genotyping

1. Data Collection and Initial Sequence Processing

2. Segment Assignment and Reference Selection

3. Sequence Length Criteria for Segment Inclusion

4. Alignment Construction

5. Phylogenetic Tree Construction

6. Genogroup and Genotype Demarcation

7. Reference Sequence Selection

8. Segment Reference Phylogenies

9. Automated Typing Process

Overview & Installation

Core Project

GLUE User Guide

GLUE Command Reference

Clone this wiki locally