Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gff3, no gene name in ouput - MissingGeneID #2209

Open
desmodus1984 opened this issue Sep 7, 2024 · 0 comments
Open

Gff3, no gene name in ouput - MissingGeneID #2209

desmodus1984 opened this issue Sep 7, 2024 · 0 comments

Comments

@desmodus1984
Copy link

Hi,
I am trying to assess mitochondrial gene expression. I downloaded the fasta and the gff3 from NCBI.
(https://www.ncbi.nlm.nih.gov/nuccore/BK063639.1
Since NCBI doesn't have the gtf, I used the gff3 file to make the index
and it didn't prompt any errors:

STAR --runThreadN 8 \
	--runMode genomeGenerate \
	--genomeDir /DataDrive/juaguila/BVos/RNA-fq/mito-gff3/ \
	--genomeFastaFiles /DataDrive/juaguila/BVos/RNA-fq/mito/mitoBvos.fasta \
	--genomeSAindexNbases 6 \
	--sjdbGTFfile /DataDrive/juaguila/BVos/RNA-fq/mito/MitoBVos.gff3 \
	--sjdbOverhang 49
/home/juaguila/miniconda3/envs/STAR/bin/STAR-avx2 --runThreadN 8 --runMode genomeGenerate --genomeDir /DataDrive/juaguila/BVos/RNA-fq/mito-gff3/ --genomeFastaFiles /DataDrive/juaguila/BVos/RNA-fq/mito/mitoBvos.fasta --genomeSAindexNbases 6 --sjdbGTFfile /DataDrive/juaguila/BVos/RNA-fq/mito/MitoBVos.gff3 --sjdbOverhang 49
STAR version: 2.7.11b   compiled: 2024-07-03T14:39:20+0000 :/opt/conda/conda-bld/star_1720017372352/work/source

Sep 06 21:31:33 ..... started STAR run
Sep 06 21:31:33 ... starting to generate Genome files
Sep 06 21:31:33 ..... processing annotations GTF
Sep 06 21:31:33 ... starting to sort Suffix Array. This may take a long time...
Sep 06 21:31:33 ... sorting Suffix Array chunks and saving them to disk...
Sep 06 21:31:33 ... loading chunks from disk, packing SA...
Sep 06 21:31:33 ... finished generating suffix array
Sep 06 21:31:33 ... generating Suffix Array index
Sep 06 21:31:33 ... completed Suffix Array index
Sep 06 21:31:33 ... writing Genome to disk ...
Sep 06 21:31:33 ... writing Suffix Array to disk ...
Sep 06 21:31:33 ... writing SAindex to disk
Sep 06 21:31:33 ..... finished successfully

Now, I tried to map the reads


for i in *_1_val_1.fq
        do
	base=$(basename $i "_1_val_1.fq")

STAR --runThreadN 12 \
        --genomeDir  /DataDrive/juaguila/BVos/RNA-fq/mito-gff3/\
	--readFilesIn ${base}_1_val_1.fq ${base}_2_val_2.fq  \
	--outFileNamePrefix ${base} \
        --outSAMtype BAM SortedByCoordinate \
        --outSAMattributes NH HI NM MD \
        --alignIntronMin 20 \
        --alignIntronMax 100 \
        --quantMode GeneCounts \
	--limitBAMsortRAM 1096479141
done

And, there is no gene in GeneCounts:
N_unmapped 11131325 11131325 11131325
N_multimapping 6011 6011 6011
N_noFeature 674973 7779465 1127893
N_ambiguous 0 0 0
MissingGeneID 7557418 452926 7104498

I then even converted the gff3 to gtf using AGAT, and the gene names are vague/meaningless:
N_unmapped 19624491 19624491 19624491
N_multimapping 1171 1171 1171
N_noFeature 1050 726463 192370
N_ambiguous 5143 433 4279
agat-gene-1 7 13 2
agat-gene-2 6 2 11
agat-gene-3 2 1 1
agat-gene-4 790 120 684
agat-gene-5 1 12 0
agat-gene-6 9 9 0
agat-gene-7 138981 133349 6023
agat-gene-8 15186 499 15078
agat-gene-9 0 0 0
agat-gene-10 9803 2182 7621
agat-gene-11 29 10 19
agat-gene-12 0 0 8
agat-gene-13 0 1 0
agat-gene-14 0 1 0
agat-gene-15 0 0 0
agat-gene-16 10698 1445 9253
agat-gene-17 38905 1874 37031
agat-gene-18 1 1 0
agat-gene-19 46999 10685 36314
agat-gene-20 4061 3537 549
agat-gene-21 0 4 2
agat-gene-22 585645 33359 552299
agat-gene-23 60897 4222 56675
agat-gene-24 15 6 9

Could you tell why the index with the gff3 worked, but then the mapping output is weird?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant