-
Notifications
You must be signed in to change notification settings - Fork 7
01 Overview
vcfdist evaluates the correctness of a set of phased variant calls (query VCF) relative to a set of phased ground truth variant calls (truth VCF) for a subset (regions BED) of the desired genome (reference FASTA). vcfdist was designed to evaluate human genomes, but should work on other monoploid and diploid species. It can evaluate variants of any type, including STRs (simple tandem repeats) and CNVs (copy number variants), but vcfdist classifies variants into SNPs (single nucleotide polymorphisms), INDELS (insertions and deletions), and SVs (structural variants) during evaluation. Evaluating variants larger than 10,000 bases is not recommended at the moment, as it will require large amounts of memory (over 50GB RAM). Below is a diagrammatic overview of vcfdist. Inputs are shown in red, internal steps in yellow, and optional steps in gray.
- Parameters and Usage
- Variant Filtering
- VCF Normalization
- Variant Clustering
- Precision and Recall
- Phasing Analysis
- Alignment Distance
- Outputs
- Variant Stratification
Folder | Description |
---|---|
src |
contains all C++ source code for vcfdist |
demo |
contains a simple self-contained vcfdist example script, including inputs and expected output |
analysis |
contains analysis scripts for "vcfdist: accurately benchmarking phased small variant calls" |
analysis-v2 |
contains analysis scripts for "Jointly benchmarking phased small and structural variant calls with vcfdist" |
docs |
contains old wiki documentation |