This guide has been prepared to assist researchers and users of the denvLineages classification system. It is especially useful for those who encounter clusters of sequences that, meeting the pre-established parameters outlined below, can be designated as new lineages.
- A grouping of at least 10 samples.
- The branch must contain at least one amino acid mutation.
- Bootstrap or UFBoot value (confidence) equal to or greater than 90.
- The identified mutation must cover at least 90% of the samples in the branch and isolate it from the rest of the tree.
- Only nodes with a GRI value greater than 1 will be considered.
For the preparation of this guide, a set of fictional sequences was generated through random induction of mutations in a phylogenetic tree sequence, simulating evolutionary diversification events. The corresponding file can be downloaded for users to proceed with the proposed analyses, allowing verification and validation of the methods presented.
- Ensure the sequence group contains at least 10 samples in the NextClade genrated tree (use the tree tab of web version or visualize the "nextclade.auspice.json" file generated by the local version using the --output-tree option).
- Download the "nextclade.auspice.json" file from the "export" tab of the web version of Nextclade or locate it among the files generated by the local version of the application.
- Execute the Autolin (original version) script with the following command:
python annotate_json.py -i nextclade.auspice.json -o nextclade.auspice_annotated.json -m -f 1 -s 10 -d 1 -c 90
- -m: Considers only mutations that alter amino acids.
- -f: Considers only nodes with a GRI value greater than 1.
- -s: Sets a minimum of 10 samples to annotate a lineage.
- -d: Sets the presence of at least one mutation compared to the ancestral annotation.
- -c: Sets that 90% of the samples must be annotated at each level.
An alternative is to use the web version of Autolin with the parameters above, as demonstrated below:
After annotation by Autolin, check if a new label has been assigned to the branch where the target sequences were inserted. If annotation is successful, this indicates that most of the pre-established parameters for proposing a lineage are present in this branch.
Example of an annotation made by Autolin on a group of fictional sequences inserted into the trees:
- Use the "Click + Shift" key combination on the branch containing the annotation to view the information associated with the annotated branch:
The user should monitor the presence of homoplasic and unique mutations. Homoplasic mutations have a reduced ability to isolate the branch from the rest of the tree; however, when associated with inherited mutations, they can enable isolation, especially when involving two or more mutations. On the other hand, unique mutations are capable of efficiently isolating lineages.
After the initial identification step, if the target branch displays the necessary characteristics for lineage designation, it is highly recommended to perform a phylogenetic reconstruction to retrieve UFBoot values associated with parental branch relationships.
Given that running the phylogenetic tree with all samples requires significant computational capacity, which may not be accessible to most users, we recommend that the analysis be conducted exclusively with the sequences that make up the immediately preceding lineage.
Considering that, in the previous example, the fictional sequences were assigned to subgenotype 4II.A, the analysis was performed exclusively with the sequences that make up this subgenotype:
-
Download the "nextclade.nwk" file from the web version of NextClade or locate it among the files generated by the local execution of the tool (using the --output-tree-nwk option).
-
Use FigTree to select all Taxa from the annotated ancestral branch (4II.A) and copy the list with the sequence identifiers, which should be downloaded from the database (for DENV-1, 2 and 3, use GISAID EpiArbo; for DENV-4 use GenBank):
- Perform phylogenetic reconstruction using the execPhyloDenv.sh script with the original branch sequences in combination with the target sequences (note that the script activates a conda environment containing augur v24.1.0 in line 72, modify it to your execution environment needs).
- Execute the customized Autolin script with the same parameters:
python annotate_json.py -i nextclade.auspice.json -o nextclade.auspice_annotated.json -m -f 1 -s 10 -d 1 -c 90
- Check if an annotation has been made on the branch previously identified as a possible new lineage.
- If the annotation is present, check the amino acid mutations assigned to evaluate their ability to isolate the branch.
- With these steps, you should be able to identify and gather evidence to designate a possible new lineage of the dengue virus according to the established criteria.
After identifying a possible new circulating lineage of the dengue virus, we request that the user create an "issue" on denvLineages GitHub page, providing the following information gathered during the lineage verification process:
- Identifier of the genomic sequences used (GenBank or GISAID EpiArbo).
- Highlight collection date and isolation location.
- Identification of the ancestral annotation to the lineage to be established; in the example cited, it would be the information 4II.A.
- Amino acid mutations found in the branch.
The classifier will be updated regularly, the frequency of which will be determined, ensuring its representativeness for the currently circulating lineages.