Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncbi taxonomy data cleaning #1014

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

krishnaswamypradeep
Copy link

No description provided.

Copy link

google-cla bot commented Apr 22, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Contributor

@spiekos spiekos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start to documenting the import!

  1. I filled out the README.md further, but there are still a couple of subsections that need to be added - please finish.
  2. Currently this does not run as intended by the prescribed import procedure. Please make sure that files are stored and callable properly in the scripts/, import/, and output/ subdirectories. Currently everything needs to be stored, called, and written to the same head directory
  3. There are multiple formatting errors in the generated "ncbi_taxonomy_schem_enum.mcf" file - please update your script to fix this
  4. The taxonRank is not processed as part of the nodes.dmp file - this is column 3 (rank) of the file. Please add the two prescribed two lines of code to the function at line 514 to fix this.
  5. Please include the java test tool in the test script including downloading it as part of the script. These lines that need to be added to tests.sh are in the comments.

spiekos and others added 8 commits June 25, 2024 11:18
add schema documentation on edges
add json tool test
Add documentation around new schema added as part of this import
add missing enumeration generated by script to the appropriate subsection of the New Schema section
add quotes around name for generated enum
revert back to original test script
Copy link
Contributor

@spiekos spiekos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address issue with the expected enum test files. Please make sure that the added tests work in general. Finally, please add section on notes and caveats to the README.md.

update the properties subsection to describe all properties included in the import
update notes and caveat subsection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants