Skip to content
This repository has been archived by the owner on Jan 31, 2020. It is now read-only.

Duplicate entries in Pindel output #109

Open
stevekm opened this issue Jul 5, 2019 · 0 comments
Open

Duplicate entries in Pindel output #109

stevekm opened this issue Jul 5, 2019 · 0 comments

Comments

@stevekm
Copy link

stevekm commented Jul 5, 2019

I was having problems with the annotation of the .vcf output from Pindel, due to the presence of duplicate entries in the .vcf. For example:


#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NORMAL | TUMOR

chr2 | 113983582 | . | T | TGGGAGTCCGGGGCCAGGAGGGACAGAGGAGTCAGTATTCTGTATTTTCAACGCCCCCCACCCGGACGGGTGGGAGGGT | . | PASS | END=113983582;HOMLEN=0;SVLEN=78;SVTYPE=INS | GT:AD | 0/0:1083,1 | 0/0:1115,0

chr2 | 113983582 | . | T | TGGGAGTCCGGGGCCAGGAGGGACAGAGGAGTCAGTATTCTGTATTTTCAACGCCCCCCACCCGGACGGGTGGGAGGGT | . | PASS | END=113983582;HOMLEN=0;SVLEN=78;SVTYPE=INS | GT:AD | 0/0:1083,1 | 0/0:1115,0

There are many such entries in the .vcf file produced.

I thought this might an issue with the .vcf conversion from the original data format, but the duplicates actually appear inside the raw data output as well:

$ grep 113983582 pindel_output/*
pindel_output/_SI:530	I 78	NT 78 "GGGAGTCCGGGGCCAGGAGGGACAGAGGAGTCAGTATTCTGTATTTTCAACGCCCCCCACCCGGACGGGTGGGAGGGT"	ChrID chr2	BP 113983582	113983583	BP_range 113983581	113983583	Supports 1	1	+ 1	1	- 0	0	S1 2	SUM_MS 60	2	NumSupSamples 1	1	NORMAL 1083 1071 1 1 0 0	TUMOR 1115 1105 0 0 0 0
pindel_output/_SI:552	I 78	NT 78 "GGGAGTCCGGGGCCAGGAGGGACAGAGGAGTCAGTATTCTGTATTTTCAACGCCCCCCACCCGGACGGGTGGGAGGGT"	ChrID chr2	BP 113983582	113983583	BP_range 113983581	113983583	Supports 1	1	+ 1	1	- 0	0	S1 2	SUM_MS 60	2	NumSupSamples 1	1	NORMAL 1083 1071 1 1 0 0	TUMOR 1115 1105 0 0 0 0

Why are duplicate entries being reported? And is it safe to remove them? What is the recommended removal method?

I am using Pindel version 0.2.5b9

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant