Releases: TimD1/vcfdist
Releases · TimD1/vcfdist
v2.5.3
v2.5.2
v2.5.1
v2.5.0
Major Changes
- New definition of "sync groups" (complex variants) when attributing credit to variants. The new definition will break dependencies if the selected (rather than all possible) backtracking path(s) pass(es) through the reference diagonal. As a result, there should be more smaller sync groups, and fewer partial credit calls.
- Precision-recall backtracking algorithm now maximizes TP calls
- Removed the
-s, --smallest-variant
option. It offers no runtime benefits and will negatively impact performance (since small variants are prematurely filtered, they cannot be found equivalent to remaining variants). Instead, stratify variants after benchmarking or adjust the--sv-threshold
and-l --largest-variant
parameters to evaluate the desired variants.
Minor bugfixes
- Fixed an erroneous
return
instead ofbreak
statement that caused segfaults in v2.4.0 when using--cluster gap
or--cluster size
. - Fixed a logical error that caused
left_reach
andright_reach
to not be calculated for the first and last clusters on a contig, resulting in incorrect superclustering.
v2.4.0
Major changes
- changed handling of BED regions (see wiki) to exclude variants on border, necessary to be consistent with Truvari and how ground truth BEDs were generated
Minor updates
- added
-lm
and-lstdc++
during linking, which should allowclang++
compilation (working towards bioconda release) - removed
libstdc++fs
dependency (further increasing compatibility)
v2.3.4
v2.3.3
Minor updates
- started the vcfdist wiki, which is currently a work-in-progress
- added
THRESHOLD
column toprecision-recall-summary.tsv
, containing eitherNONE
orBEST
- added
make install
command - added new size-based clustering heuristic, explained in wiki
- added evaluating
ALL
variants to tostdout
andprecision-recall.tsv
- added
RD
andQD
tags tosummary.vcf
, listing reference and query distances from truth sequence - added
REF_DIST
andQUERY_DIST
columns toquery.tsv
andtruth.tsv
containing the same info
Minor bugfixes
- fixed
precision-recall-summary.tsv
extra tab - added
Makefile
comment thatlibstdc++fs
inclusion depends on GCC version - fixed off-by-one error that miscounted
TRUTH_TP
andTRUTH_FN
(atg.max_qual
only) - fixed segfault when no variants are present
FORMAT/BC
tag insummary.vcf
is nowFloat
(notString
)- fixed
credit
being set to0.0
for all FP query variants below--credit-threshold
v2.3.2
Major updates to analysis-v2 scripts
- these scripts accompany the upcoming vcfdist-v2 paper
Minor updates
- added
--sv-threshold
, which adds third precision/recall stratification
Minor bugfixes
- fixed divide-by-zero if several variants are equivalent to no variants
- fixed off-by-one error in phasing analysis logs
v2.3.1
v2.3.0
Phasing analysis updates
- added phasing threshold: superclusters are only considered phased if one phasing is an X% improvement over the other in terms of edit distance (this reduces false positive supercluster phasing flip errors that are actually variant calling errors), default
0.6
- added phasing summary TSV (
phasing-summary.tsv
) that reports total flip errors, switch errors, phaseblock NG50, switch NGC50, and switchflip NGC50 - add switchflip TSV (
switchflips.tsv
) that reports flip range, type, supercluster, and phase block - phase blocks are now computed from input phase sets, not backtracking, and per-phaseblock switch/flip errors were added to
phase-blocks.tsv
Partial credit replaced with credit threshold
- partial credit calculation is less intuitive and complicates matters more than necessary; I replaced this with a partial credit threshold where passing variants are counted as TP, default
0.7
- I think that counting mostly-correct calls with a user-defined credit threshold is better
Runtime improvements: skip alignment distance and writing
- alignment distance calculation is now skipped by default (I now think stratifying precision-recall curves by INDEL size may be more useful), can be turned on with
-d, --distance
- original and realigned truth/query VCFs are only written if
--realign
selected
Added new analyses
- added
analysis-v2
directory for upcoming paper figures
Minor fixes
- GA4GH output VCF no longer always outputs
gm
: now it usesgm
for TP,lm
for PP, and.
for FP/FN