-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ncbi taxonomy data cleaning #1014
base: master
Are you sure you want to change the base?
ncbi taxonomy data cleaning #1014
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
fill in missing subsections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start to documenting the import!
- I filled out the README.md further, but there are still a couple of subsections that need to be added - please finish.
- Currently this does not run as intended by the prescribed import procedure. Please make sure that files are stored and callable properly in the scripts/, import/, and output/ subdirectories. Currently everything needs to be stored, called, and written to the same head directory
- There are multiple formatting errors in the generated "ncbi_taxonomy_schem_enum.mcf" file - please update your script to fix this
- The taxonRank is not processed as part of the nodes.dmp file - this is column 3 (rank) of the file. Please add the two prescribed two lines of code to the function at line 514 to fix this.
- Please include the java test tool in the test script including downloading it as part of the script. These lines that need to be added to tests.sh are in the comments.
scripts/biomedical/NCBI_Taxonomy/schema_mfc/ncbi_taxonomy_schema.mcf
Outdated
Show resolved
Hide resolved
scripts/biomedical/NCBI_Taxonomy/test_data/division_enum_expected.mcf
Outdated
Show resolved
Hide resolved
scripts/biomedical/NCBI_Taxonomy/scripts/format_ncbi_taxonomy.py
Outdated
Show resolved
Hide resolved
add schema documentation on edges
update script
add json tool test
Add documentation around new schema added as part of this import
Fix typo in subdirectory name
add missing enumeration generated by script to the appropriate subsection of the New Schema section
add quotes around name for generated enum
revert back to original test script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address issue with the expected enum test files. Please make sure that the added tests work in general. Finally, please add section on notes and caveats to the README.md.
scripts/biomedical/NCBI_Taxonomy/test_data/division_enum_expected.mcf
Outdated
Show resolved
Hide resolved
scripts/biomedical/NCBI_Taxonomy/test_data/host_enum_expected.mcf
Outdated
Show resolved
Hide resolved
scripts/biomedical/NCBI_Taxonomy/test_data/nodes_enum_expected.mcf
Outdated
Show resolved
Hide resolved
update the properties subsection to describe all properties included in the import
update notes and caveat subsection
No description provided.