Releases: cancerit/cgpPindel
v3.0.2: fix to example filters
Correct example rule files for *Fragment.lst
files to use FF[[:digit:]]
+ filter types
v3.0.1 - tabix call change
3.0.1
- Update tabix calls to directly use query_full (solves GRCh38 contig name issues).
v3.0.0, reduced i/o
3.0.0
- Germline bed file is now merged for adjacent regions (#31)
- Ability to use fragment based counts for filters (#56)
- More compressed intermediate files (#55)
- Change to
Const::Fast
where appropriate (#41) - Removed TG VG from genotype.
- Readgroups are always variable, often 1 in data from last few years
- Not used by our filters.
- Supports BAM/CRAM inputs
- Output will be aligned with inputs
- bam vs cram
- bai vs csi
- Although ground work for csi input/output has been done
Bio::DB::HTS
doesn't support csi indexed input yet.- Created our own fork at
cancerit/Bio::DB::HTS
so that this could be enabled. - You will need to install this manually or use one of our images for this functionallity.
- Created our own fork at
v2.2.4 - Internal record sorting
Sorting within a records fields cleaned up, primarily ensures filter and info column is consistently ordered.
v2.2.3 - Fix to DI event collation in pindel core
Correct read sorting during collection of DI events. Caused some events to be split into many and others to be missed (Thanks to @liangkaiye for patch)
Testing details:
For passed variants:
Comparing sites in VCF files...
Found 277 sites common to both files.
Found 0 sites only in main file.
Found 0 sites only in second file.
Found 0 non-matching overlapping sites.
After filtering, kept 277 out of a possible 912840 Sites
For all variants:
Comparing sites in VCF files...
Found 907254 sites common to both files.
Found 2294 sites only in main file.
Found 3698 sites only in second file.
Found 3292 non-matching overlapping sites.
If you then investigate the individual classes unfiltered:
Deletions:
$ zgrep -c 'PC=D;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:463740
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:463740
Insertions:
$ zgrep -c 'PC=I;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:423638
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:423638
Complex:
$ zgrep -c 'PC=DI;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:25462
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:26866
There is actually overall an increase in total complex events. This is actually not that surprising. The bad sorting of the reads could just as easily prevent an event from reaching the required threshold for reporting.
Handle LD_PRELOAD problem
Reduces the amount of temporary space required and overall I/O
To process 40 million readpairs (40x Tumour + 40x Normal, chr21, 100bp reads):
Original time:
User time (seconds): 3553.88
System time (seconds): 63.92
Percent of CPU this job got: 159%
Elapsed (wall clock) time (h:mm:ss or m:ss): 37:51.63
File system inputs: 64
File system outputs: 1782080
New time:
User time (seconds): 3572.21
System time (seconds): 74.06
Percent of CPU this job got: 167%
Elapsed (wall clock) time (h:mm:ss or m:ss): 36:15.01
File system inputs: 0
File system outputs: 1139128
Original peak size: 650MB
New peak size: 291MB
~55% reduction in working space and about 40% fewer writes to the file system.
Exactly the same results:
$ diff old/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9.germline.bed new/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9.germline.bed
$ diff_bams -a old/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9_wt.bam -b new/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9_wt.bam
Reference sequence count passed
Reference sequence order passed
Matching records: 194543
$ diff_bams -a old/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9_mt.bam -b new/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9_mt.bam
Reference sequence count passed
Reference sequence order passed
Matching records: 239737
$ /software/CGP/canpipe/live/bin/canpipe_live vcftools --gzvcf old/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9.flagged.vcf.gz --gzdiff new/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9.flagged.vcf.gz
...
Comparing individuals in VCF files...
N_combined_individuals: 2
N_individuals_common_to_both_files: 2
N_individuals_unique_to_file1: 0
N_individuals_unique_to_file2: 0
Comparing sites in VCF files...
Found 15321 SNPs common to both files.
Found 0 SNPs only in main file.
Found 0 SNPs only in second file.
After filtering, kept 16309 out of a possible 16309 Sites
Run Time = 6.00 seconds
Legacy v1 support fix: v1.5.7
Pulls this fix back into the legacy 1.x codebase.
v2.0.8 - Bugfix for WXS filter F009
The F009 filter was always passing. Only applied on WXS data.
v2.0.7: bugfix for newer perl versions
Fixes #46, apparent use of experimental features and/or typos causing issues in perl-5.20