-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disha/primary transc fix #358
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as intended, minor nitpicking as usual
@@ -457,7 +457,7 @@ def normalize_mirna(self, gene: SeqFeature) -> List[SeqFeature]: | |||
"""Returns gene representations from a miRNA gene that can be loaded in an Ensembl database. | |||
|
|||
Change the representation from the form `gene[ primary_transcript[ exon, miRNA[ exon ] ] ]` | |||
to `gene[ primary_transcript[ exon ] ]` and `gene[ miRNA[ exon ] ]` | |||
to `gene[ miRNA_primary_transcript[ exon ] ]` and `gene[ miRNA[ exon ] ]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to `gene[ miRNA_primary_transcript[ exon ] ]` and `gene[ miRNA[ exon ] ]` | |
to `ncRNA_gene[ miRNA_primary_transcript[ exon ] ]` and `gene[ miRNA[ exon ] ]` |
logging.debug(f"Formatting miRNA gene {gene.id}") | ||
|
||
new_genes = [] | ||
new_primary_subfeatures = [] | ||
num = 1 | ||
for sub in primary.sub_features: | ||
if sub.type == "exon": | ||
gene.type = "ncRNA_gene" | ||
primary.type = "miRNA_primary_transcript" | ||
new_primary_subfeatures.append(sub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put the biotype change at the earlier stage where we create the primary
and the gene
(472-473)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes makes sense ! I have forgotten to make that change.
does_not_raise(), | ||
id="gene + primary_transcript + miRNA", | ||
id="ncRNA_gene + miRNA_primary_transcript + miRNA", | ||
), | ||
param( | ||
"mirna/pseudogene.gff", | ||
"mirna/pseudogene_simped.gff", | ||
does_not_raise(), | ||
id="gene + primary_transcript - miRNA", | ||
id="ncRNA_gene + miRNA_primary_transcript - miRNA", | ||
), | ||
param( | ||
"mirna/nogene.gff", | ||
"mirna/nogene_simped.gff", | ||
does_not_raise(), | ||
id="primary_transcript + miRNA", | ||
id="miRNA_primary_transcript + miRNA", | ||
), | ||
param( | ||
"mirna/pseudo_nogene.gff", | ||
"mirna/pseudo_nogene_simped.gff", | ||
does_not_raise(), | ||
id="primary_transcript - miRNA", | ||
id="miRNA_primary_transcript - miRNA", | ||
), | ||
param( | ||
"mirna/unsupported_tr.gff", | ||
"", | ||
raises(GFFParserError, match="Unknown subtype"), | ||
id="gene + primary_transcript + mRNA, not supported", | ||
id="ncRNA_gene + miRNA_primary_transcript + mRNA, not supported", | ||
), | ||
param( | ||
"mirna/two_primary.gff", | ||
"", | ||
raises(GFFParserError, match="too many sub_features"), | ||
id="gene + 2x primary_transcript, not supported", | ||
id="gene + 2x miRNA_primary_transcript, not supported", | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would keep the old names, because they correspond to the input biotypes, not the results (the input files have not changed)
I have changed the biotype for primary transcript --> miRNA_primary_transcript and the associated gene --> ncRNA_gene for loading them correctly.