Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genomic Information Not Retrieved for Any Junction #11

Open
DarioS opened this issue Feb 24, 2020 · 8 comments
Open

Genomic Information Not Retrieved for Any Junction #11

DarioS opened this issue Feb 24, 2020 · 8 comments

Comments

@DarioS
Copy link

DarioS commented Feb 24, 2020

No matter which junction I click on in Gene_View, when I click on the Junction_View tab I get the pop-up error "Cannot retrieve genomic information for this gene". Below the load database button in Setup tab, I see Hsapiens.UCSC.hg38, so it seems the annotations have successfully been loaded.

@davhum
Copy link
Collaborator

davhum commented Feb 28, 2020

Hi Dario,

Can you confirm this is the case for the TwoSzabo data set? If TwoSzabo data sets works can you let me know if you analysing STAR/CIRI/CircExplorer output? If you are NOT using STAR can you copy paste an example of the gene name as it might be a lookup issue.

Thanks,
D

@DarioS
Copy link
Author

DarioS commented Feb 28, 2020

Ah, I figured it out now. It happens when Annotate With Parental Gene is not checked. Perhaps the pop-up error should state this as a possible cause. Currently, it only advises to check a database was loaded in Setup tab, which it was.

I notice a couple of other issues. There's always a red Shiny warning message at the bottom of output:

image

Also, one gene - MUC7 - has a circular RNA of length infinity. I attach input files to reproduce. test.zip

@davhum
Copy link
Collaborator

davhum commented Mar 4, 2020

Good suggestions.

I have started to look at test data set. The MUC7 example doesn't look like a BSJ, but rather a forward splice junction. I need to work out why this enters chimeric output. Will have a solution implemented in next day or so.

@davhum
Copy link
Collaborator

davhum commented Mar 5, 2020

Any chance you could attach a couple of sequences from fastq file. This will be useful sanity check.
For example the following read IDs would be useful:

A00121:71:HFFY2DSXX:2:1106:11776:28745
A00121:71:HFFY2DSXX:2:1124:22236:36495
A00121:71:HFFY2DSXX:2:1146:19090:8844

@DarioS
Copy link
Author

DarioS commented Mar 5, 2020

The pair of FASTQ files for testing is in the archive testReads.zip Trimmed using cutadapt.

@davhum
Copy link
Collaborator

davhum commented Mar 5, 2020

When I blat each of those reads it shows that they are indeed canonical junctions (see image below). Given that the CIGAR strings in chimeric junction output also suggested that they are not chimeric suggests some sort of leaky non-chimeric reporting by in chimeric output of STAR aligner.

There are a couple of things I would now like to do:

  1. Modify Ularcirc to filter out these candidates.
  2. Chase up why STAR aligner reports these reads as chimeric. On this note are you OK if I pass the testReads.zip you generated as part of a issue on STAR aligner.

MUC7_blat_fastq

In above image have labelled each forward read as F1 F2 F3 and each paired end read ar R1 R2 R3.

@DarioS
Copy link
Author

DarioS commented Mar 5, 2020

Yes, please create a new issue for STAR aligner using this test data. I am happy if you avoid creating extra filtering in Ularcirc and just get it fixed at the origin. I plan to re-map the data set soon anyway, because the next version of STAR will support writing multi-mapping chimeras within the BAM file, which is useful for arriba and rearrangements involving immunoglobulin genes, so I am happy to wait a while.

@davhum
Copy link
Collaborator

davhum commented Mar 9, 2020

Submitted issue to STAR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants