-
Notifications
You must be signed in to change notification settings - Fork 7
04 VCF Normalization
Following variant clustering, variants are optionally realigned by selecting the --realign-query
and/or --realign-truth
options. The --realign-only
flag can be used to skip downstream evaluations.
As initially introduced in this manuscript and further explored in our work, best alignment normalization can be used to select between several possible variant representations when complex variants are involved. Affine gap Smith Waterman alignment is used to select the "best" variant representation, defined by a given set of alignment parameters. The design space for these parameters (m, x, o, e) is shown below, with many common alignment tools plotted and four example (A,B,C,D) alignments with their resulting variant representations. By default, the representation selected by vcfdist is at Point C.
The traditional method of variant normalization involves decomposing complex variants, trimming unnecessary bases from the variant representation, and then left-aligning INDELs.
This procedure is sufficient to create a unique canonical representation for a single variant, but not when multiple or complex variants are involved.