-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADD: cluster and reverse complement correction for resfinder #38
ADD: cluster and reverse complement correction for resfinder #38
Conversation
Vedanth-Ramji
commented
Apr 21, 2024
•
edited
Loading
edited
- Resfinder has gene clusters which can't be passed through RGI using 'contig' mode.
- Gene clusters were identified and were manually assigned ARO numbers.
- 40 gene clusters present.
- 9 genes in reverse complement form also present.
- RC genes were all manually curated.
- Delete get_data_path function
- Resfinder has gene clusters which can't be passed through RGI using 'contig' mode. - Gene clusters were identified and were manually assigned ARO numbers. - A seperate file with manual curation for gene clusters and RCs was created, and their AROs were updated after concatenating RGI results and genes not in RGI results. - 40 gene clusters present. - 4 genes in reverse complement form also present. blaBIM-1_1_CP016446 and mph(D)_1_AB048591 were not found in CARD and were given parent ARO mappings. RGI correctly assigned ARO numbers to other two. - Corrected erm(X)_1_M36726 ARO mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
It seems, however, that this could have been done without special-casing resfinder
? Why is this different from the other manual curation files?
The code for actually running rgi correctly for resfinder (and the output thereof) is still missing as well. |
- Resfinder is comprised of coding sequences. The data wasn't being handled properly before as contig mode was used when passing coding sequences to RGI. Now, the amino acid versions of Resfinder is used with protein mode when running the database in RGI. - Resfinder AA file is generated using biopython from nucleotide file (AA file not found online). - 9 RC genes were found (previously 4 were found). All were manually curated as RC versions couldn't be translated properly into AA sequences. - Documentation updated in changelog to reflect AA version being used for Resfinder and gene cluster handling
I just added the code for using the AA version of resfinder.
The other manual curation files are adding genes that are not present in the mapping tables (i.e. RGI can't map them automatically). The new 'correction' files are changing mappings that are already present in the mapping tables (i.e. correcting RGI as it is not detecting a gene cluster or reverse complement properly and mapping to a wrong ARO number). |
I know this, but I don't see why it matters for my question. Why does the code need to special-case |
- Merged gene cluster & RC annotation with other manual curation annotations. - Moved notes for gene cluster & RC annotation to README.md in manual curation directory. - Removed hardcoded path in fna_to_faa in crude_db_harmonisation.py and made it general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want to remove the special casing of resfinder
- Note: code to integrate manual curation now removes duplicate ARO mappings. This has corrected a MEGARes annotation (GMGC10.027_903_362.EMRE) which had a one to many ARO mapping. Better manual curation for MEGARes will be present in the version after v0.3.0 when MEGARes will be investigated to check for CDSs, gene clusters and RC genes.