-
Notifications
You must be signed in to change notification settings - Fork 0
MLCA Genotyping
This page details the methodology applied to define genogroups and genotypes within BTV-GLUE, a GLUE project focused on the comparative genomics of bluetongue virus (BTV).
-
Sequence Collation: All BTV sequences were collated from GenBank as of the date 30-06-2018.
-
Exclusion of Sequences: Certain sequences were manually excluded from the analysis based on specific criteria. The accession numbers of these sequences and the reasons for their exclusion are documented in the following file:
BTV-GLUE/tabular/formatted/sequences_to_exclude.txt
-
Segment Assignment Procedure:
- A combination of GenBank annotations and the BLAST-based
btvSegmentRecogniser
module was used to assign each sequence to its corresponding segment.
- A combination of GenBank annotations and the BLAST-based
-
Master Reference Sequences: The master reference sequences selected for each segment are as follows:
- Segment 1: JX680457
- Segment 2: JX680458
- Segment 3: JX680459
- Segment 4: JX680460
- Segment 5: JX680461
- Segment 6: JX680462
- Segment 7: JX680463
- Segment 8: JX680464
- Segment 9: JX680465
- Segment 10: JX680466
-
Outgroup Sequences:
- Epizootic Hemorrhagic Disease Virus (EHDV): AM744977-AM744986
- Palyam Virus (PATAV): JQ070386-JQ070395
-
Segments 1-9: Sequences must have a length of at least 90% of the segment master reference length to ensure the coding region is adequately captured.
-
Segment 10: Due to the smaller proportion of the coding region, sequences need to be at least 80% of the segment master reference length.
-
Nucleotide Alignment:
- A nucleotide alignment (
BTV_COMPL_SEG_NT
) was constructed using MAFFT v7.299b (default settings) for each segment, incorporating the master reference sequence along with all non-excluded GenBank sequences.
- A nucleotide alignment (
-
Protein Translation and Outgroup Integration:
- Each sequence in the nucleotide alignment was translated into an unaligned protein sequence of its coding region.
- Protein translations of the relevant outgroup sequences were combined and aligned using MAFFT v7.299b.
-
Reverse Translation to Nucleotide Alignment:
- The aligned protein sequences were reverse-translated back to nucleotide sequences using TBLASTN in GLUE's
blastFastaAlignmentImporter
pattern. - This produced
BTV_OUTG_CODON
alignments (outgroup codon alignments) that were inspected and manually curated.
- The aligned protein sequences were reverse-translated back to nucleotide sequences using TBLASTN in GLUE's
-
Building Phylogenetic Trees:
- Phylogenetic trees were generated from these nucleotide alignments using RAxML-NG v0.8.1 with the GTR+G+I model and 500 bootstrap replicates.
- The Transfer Bootstrap Estimate (TBE) method was employed for node support calculation (refer to https://doi.org/10.1038/s41586-018-0043-0).
-
Tree Rerooting and Outgroup Usage:
- Outgroups were generally used for tree rerooting, after which they were removed, except for segment 6, where the EHDV sequence was retained.
- For segments 5, 8, 9, and 10, midpoint rooting was applied due to the unrealistic placement of outgroup rooting.
-
ClusterPicker Settings:
- ClusterPicker 1.2.5 was used to define genogroups and genotypes based on the following parameters:
- Transfer Bootstrap Threshold: 75%
- Genetic Distance Strategy: "gap"
- ClusterPicker 1.2.5 was used to define genogroups and genotypes based on the following parameters:
-
Genetic Distance Thresholds:
- Segment 1 Genogroup: p = 0.2
- Segment 2 Genogroup: p = 0.35
- Segment 2 Genotype: p = 0.22
- Segment 3 Genogroup: p = 0.2
- Segment 4 Genogroup: p = 0.2
- Segment 5 Genogroup: p = 0.2
- Segment 6 Genogroup: p = 0.33
- Segment 7 Genogroup: p = 0.2
- Segment 8 Genogroup: p = 0.2
- Segment 9 Genogroup: p = 0.2
- Segment 10 Genogroup: p = 0.2
-
These thresholds were chosen to reflect groupings from the literature, ensuring that previously characterized serotypes align with one or two genotypes for segment 2.
-
Manual Selection of Reference Sequences:
- References were manually chosen by Kiki to capture the genetic diversity within each genogroup and genotype.
-
Reference Sequence Files:
- Detailed lists of reference sequences for each segment can be found in:
-
BTV-GLUE/tabular/formatted/Segment<X>RefList.txt
(where<X>
is the segment number).
-
- Detailed lists of reference sequences for each segment can be found in:
-
Genotyping Codon Alignments:
- Genotyping codon alignments were created by retaining only the reference sequences from the outgroup codon alignments for each segment.
-
Tree Generation for Segment References:
- RAxML 8.2.8 was used with the GTRGAMMAI model and 1000 bootstrap replicates to generate the segment reference phylogenies.
-
Rooting of Segment Reference Phylogenies:
- The trees were rooted at internal nodes that separate specific genogroups, for example:
- Segment 1: Rooted at ["C", "D", "E"]
- Segment 2: Rooted at ["G", "J"]
- Segment 3: Rooted at ["C", "D", "E"]
- (Additional details for each segment as provided in the original text)
- The trees were rooted at internal nodes that separate specific genogroups, for example:
-
Automated Genotype Assignment:
- The genotyping codon alignments and segment reference phylogenies are the basis of the automated typing process available via the BTV-GLUE website.
-
Method Employed:
- This process uses the Maximum Likelihood Clade Assignment method described in the GLUE Bioinformatics paper.
-
Configuration Parameters:
- Distance Scaling Exponent: -3.0
- Distance Cutoff: Equal to the p-distance threshold set earlier for each segment.
- Internal Distance Cutoff: Twice the distance cutoff value.
- Clade Category Cutoff: 80%