-
Notifications
You must be signed in to change notification settings - Fork 7
03 Variant Filtering
The --bed FILENAME
option can be used to select a region to evaluate. Variants outside this region are discarded from the truth and query VCFs.
Variants on the border of BED regions are excluded. This was done to be consistent with Truvari and how most benchmarking truth BEDs were generated (discussion here). In the example below, the BED region is ref 10 20
. As you can see, Truvari and vcfdist exclude deletions overlapping the border (including if the preceding reference base in the VCF overlaps) whereas vcfeval does not. All three tools exclude insertions on the BED region border.
If a variant call is present within a spanning deletion (denoted by an ALT
field of *
), it is discarded.
If two variants overlap or two insertions occur at the exact same position (on the same haplotype), the second variant (and third, etc. if applicable) which would conflict and create an ambiguous or invalid sequence is discarded.
Variants that are reference calls (0|0
) or have unknown alleles (.|.
) are discarded. All other variant genotypes are converted to 1|0
or 0|1
(splitting variants such as 1|1
or 1|2
into two variants and selecting the correct alleles).
In VCFs, a pipe separator is used for phased variants (0|1
) and a forward slash is used for unphased variants (0/1
). All unphased variants are discarded, unless the alleles are the same (1/1
or 2/2
).
If PS
is not defined in the VCF header, then all variants are assumed to be globally phased. If a variant is missing a PS
tag, it is considered to be in the same phase set as adjacent variants missing phase tags (if present).
The --filter FILTER1,FILTER2
option can be used to select variants passing FILTER1 or FILTER2. By default, if this option is not selected, all variants will pass this stage.
The -s SIZE
and -l SIZE
options can be used to select the smallest and largest variant sizes to be evaluated (inclusive). Other variants are discarded.
The --min-qual QUAL
and --max-qual QUAL
options can be used to select the range of variant qualities to evaluate. Variants with lower quality are discarded. Variants with higher quality scores are kept, but their quality score is set to max_qual
.
Complex variants are left and right-trimmed, and then converted to an insertion plus a deletion (or substitution if that is the case).