Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pileup_core and negative length vector #19

Open
ArmandLacombe opened this issue Jun 15, 2018 · 14 comments
Open

Pileup_core and negative length vector #19

ArmandLacombe opened this issue Jun 15, 2018 · 14 comments

Comments

@ArmandLacombe
Copy link

ArmandLacombe commented Jun 15, 2018

Hi,

Despite the fact that the example given in the manual works well, I can't manage to use getcounts with my own files. My command is :

./epicseg.R getcounts --mark DNase1:/homes/kidney_a8w_DNase1_ENCODE86_bwa_samse_BR2TR1.bam --mark H3K27ac:/homes/kidney_a8w_H3K27ac_ENCODE86_bwa_samse_BR2TR1.bam --regions /homes/contigs.bed --binsize 200 --target /homes/counts.txt

where contigs is a bed file I wrote imitating the example file. The error message I get is :

loading epicseg
getting counts
Processing /homes/lacombe/nbu/data/kidney_a8w_DNase1_ENCODE86_bwa_samse_BR2TR1.bam: tHe lEgEnD said, hE cOuLd PiLe uP fAsTeR thAn LiGht
Error in pileup_core(bampath, gr, tlenFilter(tlenFilter, pe), mapqual, : negative length vectors are not allowed
Calls: <Anonymous> ... bamProfile -> bamProfile -> .local -> pileup_core -> .Call
Execution halted

I get figure out what doesn't work since every vectors seems to be ok. I guess it might have something to do with bamsignals yet I can't fix it.

Do you have an idea ? Thanks a lot !

Armand

EDIT : lamortenera wrote something concerning a similar issue a few years ago there https://github.com/lamortenera/epicseg/issues/7 so I tried to follow the same steps ut it didn't worked. On top of that my data aren't that heavy, since I have only two tracks and about ~10000 bp only are selected by the contigs file.

@lamortenera
Copy link
Owner

hey Armand, sorry for the slow reply I somehow missed your email. My first guess is that there is something wrong with your regions, can you print it out or send it to me?

@ArmandLacombe
Copy link
Author

ArmandLacombe commented Jul 5, 2018

The regions file I use is the following :
contigs3.txt (which is a bed file on my computer, I just changed the extension so as to upload it).

1 310000 320000

@lamortenera
Copy link
Owner

are you sure the string '1' matches the way chromosome 1 is called in the bam file? if you do samtools view -H how is chromosome 1 called? usually its something like 'chr1'.

@ArmandLacombe
Copy link
Author

I get

@sq SN:1 LN:195471971

The .bam file I use comes from ENSEMBL rather than UCSC, I think that the chromosomes names don't start with 'chr'.

@lamortenera
Copy link
Owner

lamortenera commented Jul 8, 2018

The problem seems unrelated to epicseg but rather to its dependent package 'bamsignals'. How big are your BAM files and how many @sq header lines do they have?
Can you check what happens when using some small BAM files with the same regions that you are using now?

@lamortenera
Copy link
Owner

Is /homes/lacombe/nbu/data/kidney_a8w_DNase1_ENCODE86_bwa_samse_BR2TR1.bam a publicly available file? It may also be that the file is somehow corrupted, can you try to download it again? What if you try to get the count matrix only with the other BAM file, does that work?

If the issue with that BAM file persist, it should be possible to reproduce it with the R code:

library(bamsignals)
library(GenomicRanges)
gr <- GenomicRanges(seqnames='1', ranges=IRanges(start=310000, end=320000))
path <- '/homes/lacombe/nbu/data/kidney_a8w_DNase1_ENCODE86_bwa_samse_BR2TR1.bam'
counts <- bamProfile(path, gr, binsize=200, shift=75, mapq=0, paired.end=FALSE)

Warning, I haven't run that code so you may need to fix some small things.
Let me know if you can reproduce the problem with the code above. In that case we'll need to file a bug about bamsignals.

@lamortenera
Copy link
Owner

let me know if there is an issue with bamsignals, I am one of the maintainers and we should be able to fix that relatively quickly.

@ArmandLacombe
Copy link
Author

ArmandLacombe commented Jul 10, 2018

The .bam file I'm using weights 129 MB. I'm not sure what is a @sq header line, but when I run the command samtools view -H I get 96 lines beginning with @sq.

I don't think that my file is corrupted since it comes directly from ENSEMBL and I have no problem using it on other segmentation algorithms. However, the manual usage example works without any issue.

I also ran the code you gave me and got :

counts <- bamProfile(path, gr, binsize=200, shift=75, mapq=0, paired.end=FALSE)
Processing /homes/data/kidney_a8w_DNase1_ENCODE86_bwa_samse_BR2TR1.bam: fOr brOoMmHiLdA!!!
Error in match.arg(paired.end) : 'arg' must be NULL or a character vector
In addition: Warning message:
In .local(bampath, gr, ...) :
some ranges' widths are not a multiple of the selected
binsize, some bins will correspond to less than binsize basepairs

@lamortenera
Copy link
Owner

lamortenera commented Jul 17, 2018

Hey Armand, can you send me this BAM fie or point me to it? I'll have a look.
Sorry for the delay in answering but there must be something broken with the github notifications that I get in my inbox. I guess if you edit a message I don't get notified.

@ArmandLacombe
Copy link
Author

Here it is : https://we.tl/SzP05vCmYi . Thank you very much !

@lamortenera
Copy link
Owner

I just ran:

library(bamsignals)
library(GenomicRanges)
gr <- GRanges(seqnames='1', ranges=IRanges(start=310000, end=319999))
path <- 'kidney_a8w_H3K27ac_ENCODE86_bwa_samse_BR2TR1.bam'
counts <- bamProfile(path, gr, binsize=200, shift=75, mapq=0, paired.end="ignore")

Couldn't find any issues with that... I'll try with epicseg directly now.

@lamortenera
Copy link
Owner

I cant reproduce your problems, I just ran:

./epicseg.R getcounts --mark H3K27ac:kidney_a8w_H3K27ac_ENCODE86_bwa_samse_BR2TR1.bam --binsize 200 --target counts.txt --regions contigs.txt

And everything worked fine.

What system are you using? R version? Bioconductor version? OS? G++ version?
Can you try to uninstall and reinstall the R packages Rcpp, epicseg, kfoots and bamsignals?

@lamortenera
Copy link
Owner

Can you print the output of 'session_info()' and 'sessionInfo()' at the interactive R console ?

@ArmandLacombe
Copy link
Author

'session_info()' doesn't work but 'sessionInfo()' gets

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.3 (Maipo)

Matrix products: default
BLAS: /homes/lacombe/anaconda2/lib/R/lib/libRblas.so
LAPACK: /homes/lacombe/anaconda2/lib/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] RevoUtils_10.0.8

loaded via a namespace (and not attached):
[1] compiler_3.4.3

I'm using Bioconductor version 3.6 (BiocInstaller 1.28.0) and gcc version 7.2.0 (crosstool-NG fa8859cb).

I'll try to uninstall and reinstall the R packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants