Skip to content

Releases: cancerit/cgpPindel

v3.0.2: fix to example filters

21 Sep 13:10
27c0916
Compare
Choose a tag to compare

Correct example rule files for *Fragment.lst files to use FF[[:digit:]]+ filter types

v3.0.1 - tabix call change

26 Mar 20:32
Compare
Choose a tag to compare

3.0.1

  • Update tabix calls to directly use query_full (solves GRCh38 contig name issues).

v3.0.0, reduced i/o

02 Mar 15:40
c40f376
Compare
Choose a tag to compare

3.0.0

  • Germline bed file is now merged for adjacent regions (#31)
  • Ability to use fragment based counts for filters (#56)
  • More compressed intermediate files (#55)
  • Change to Const::Fast where appropriate (#41)
  • Removed TG VG from genotype.
    • Readgroups are always variable, often 1 in data from last few years
    • Not used by our filters.
  • Supports BAM/CRAM inputs
  • Output will be aligned with inputs
    • bam vs cram
    • bai vs csi
  • Although ground work for csi input/output has been done Bio::DB::HTS doesn't support csi indexed input yet.

v2.2.4 - Internal record sorting

30 Nov 11:33
ebebbc9
Compare
Choose a tag to compare

Sorting within a records fields cleaned up, primarily ensures filter and info column is consistently ordered.

v2.2.3 - Fix to DI event collation in pindel core

25 May 11:51
e36eeb5
Compare
Choose a tag to compare

Correct read sorting during collection of DI events. Caused some events to be split into many and others to be missed (Thanks to @liangkaiye for patch)

Testing details:

For passed variants:

Comparing sites in VCF files...
Found 277 sites common to both files.
Found 0 sites only in main file.
Found 0 sites only in second file.
Found 0 non-matching overlapping sites.
After filtering, kept 277 out of a possible 912840 Sites

For all variants:

Comparing sites in VCF files...
Found 907254 sites common to both files.
Found 2294 sites only in main file.
Found 3698 sites only in second file.
Found 3292 non-matching overlapping sites.

If you then investigate the individual classes unfiltered:

Deletions:

$ zgrep -c 'PC=D;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:463740
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:463740

Insertions:

$ zgrep -c 'PC=I;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:423638
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:423638

Complex:

$ zgrep -c 'PC=DI;' pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz
pre-fix/TUMOUR_vs_NORMAL.flagged.vcf.gz:25462
postfix/TUMOUR_vs_NORMAL.flagged.vcf.gz:26866

There is actually overall an increase in total complex events. This is actually not that surprising. The bad sorting of the reads could just as easily prevent an event from reaching the required threshold for reporting.

Handle LD_PRELOAD problem

18 Mar 09:38
2c42f9e
Compare
Choose a tag to compare
  • Fixed an incorrect loop in file handling for filter_pindel_reads.cpp (#49).
  • Moved some messages in cpp code to stderr.

No change to usage or output, release specifically to fix problem encountered during profiling by Ellexus

Reduces the amount of temporary space required and overall I/O

09 Mar 15:03
728f8ec
Compare
Choose a tag to compare

To process 40 million readpairs (40x Tumour + 40x Normal, chr21, 100bp reads):

Original time:

User time (seconds): 3553.88
System time (seconds): 63.92
Percent of CPU this job got: 159%
Elapsed (wall clock) time (h:mm:ss or m:ss): 37:51.63
File system inputs: 64
File system outputs: 1782080

New time:

User time (seconds): 3572.21
System time (seconds): 74.06
Percent of CPU this job got: 167%
Elapsed (wall clock) time (h:mm:ss or m:ss): 36:15.01
File system inputs: 0
File system outputs: 1139128
Original peak size: 650MB
     New peak size: 291MB

~55% reduction in working space and about 40% fewer writes to the file system.

Exactly the same results:

$ diff old/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9.germline.bed new/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9.germline.bed

$ diff_bams -a old/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9_wt.bam -b new/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9_wt.bam
Reference sequence count passed
Reference sequence order passed
Matching records: 194543

$ diff_bams -a old/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9_mt.bam -b new/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9_mt.bam
Reference sequence count passed
Reference sequence order passed
Matching records: 239737

$ /software/CGP/canpipe/live/bin/canpipe_live vcftools --gzvcf old/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9.flagged.vcf.gz --gzdiff new/f9c3bc8e-dbc4-1ed0-e040-11ac0d4803a9_vs_f9c3bc8e-dbc1-1ed0-e040-11ac0d4803a9.flagged.vcf.gz
...
Comparing individuals in VCF files...
N_combined_individuals:	2
N_individuals_common_to_both_files:	2
N_individuals_unique_to_file1:	0
N_individuals_unique_to_file2:	0
Comparing sites in VCF files...
Found 15321 SNPs common to both files.
Found 0 SNPs only in main file.
Found 0 SNPs only in second file.
After filtering, kept 16309 out of a possible 16309 Sites
Run Time = 6.00 seconds

Legacy v1 support fix: v1.5.7

20 Oct 08:25
Compare
Choose a tag to compare

Pulls this fix back into the legacy 1.x codebase.

v2.0.8 - Bugfix for WXS filter F009

08 Sep 14:51
Compare
Choose a tag to compare

The F009 filter was always passing. Only applied on WXS data.

v2.0.7: bugfix for newer perl versions

16 Jun 12:50
Compare
Choose a tag to compare

Fixes #46, apparent use of experimental features and/or typos causing issues in perl-5.20