Skip to content

Multiple Sequence Alignments

Robert J. Gifford edited this page Nov 27, 2024 · 6 revisions

Multiple Sequence Alignments

In GLUE, a 'constrained MSA' is an multiple sequence alignment (MSA) in which the coordinate space is defined by a selected reference sequence. Where alignment members contain insertions relative to the reference sequence, the inserted sequences are recorded and stored (i.e. sequence data is never deleted).

GLUE projects have the option of using a data structure called an alignment tree to link constrained MSAs representing different taxonomic levels, and we've used this approach in Flavivirus-GLUE.

Alignment tree concept

The schematic figure above shows the 'alignment tree' data structure currently implemented in Flavivirus-GLUE. For the highest taxonomic levels (i.e. at the root) we aligned only the most conserved regions of the genome, whereas for the lower taxonomic levels (i.e. within and below genus level) we aligned complete coding sequences. We used an alignment tree data structure to link these alignments, via a set of common reference sequences. The root alignment contains reference sequences for major clades, whereas all children of the root inherit at least one reference from their immediate parent. Thus, all alignments are linked to one another via our chosen set of master reference sequences.

Alignments in the project include:

  1. A ‘root’ alignment (i.e. family-level) constructed to represent homology between the two largest subgroupings in the Flaviviridae.

  2. major-lineage’ alignments constructed to represent proposed homologies between representative members of major Flaviviridae lineages.

  3. minor-lineage’ alignments constructed to represent proposed homologies between representative members of 'minor' Flaviviridae lineages.

  4. genus-level’ alignments constructed to represent proposed homologies between the genomes of representative members of specific flavivirid genera.

  5. subgenus-level’ alignments constructed to represent proposed homologies between the genomes of representative members of specific flavivirid subgenera.