Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nextclade rules in ingest [#21] #32

Merged
merged 4 commits into from
Jan 9, 2025

Conversation

genehack
Copy link
Contributor

Description of proposed changes

Updates usage of augur in ingest build to align with pathogen repo guide standards.

Closes #21

  • Checks pass

input:
nextclade="results/nextclade.tsv",
output:
nextclade_metadata=temp("results/nextclade_metadata.tsv"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to mark this as temp? Would it not be useful for debugging along with all the other intermediate files in results/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's marked as temp because the next rule (join_metadata_and_nextclade) merges this file and the other metadata (data/subset_metadata.tsv) together, so all the data ends up in that file anyway. Any debugging would most likely happen using that file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we want to debug augur curate rename in this rule and join_metadata_and_nextclade fails or we simply don't want to run it?

My argument is mostly based on consistency, since the same could be said about results/nextclade.tsv: why not make that temp because it eventually makes its way into results/metadata.tsv?

Consider this non-blocking since it works fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we want to debug augur curate rename in this rule and join_metadata_and_nextclade fails or we simply don't want to run it?

In such a situation, Snakemake's --notemp option could be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My argument is mostly based on consistency, since the same could be said about results/nextclade.tsv: why not make that temp because it eventually makes its way into results/metadata.tsv?

results/nextclade.tsv and results/nextclade_metadata.tsv are effectively isotopes of the same underlying data; we don't need to retain both of them, because if you have one, you can tell what had to have been in the other one. I marked the latter one as temp because it's the newer of the two, so it was less change.

@genehack genehack force-pushed the update-nextclade-rules-in-ingest-21 branch from c1c19ac to 6a4b959 Compare January 9, 2025 23:31
@genehack genehack merged commit b451c59 into main Jan 9, 2025
5 checks passed
@genehack genehack deleted the update-nextclade-rules-in-ingest-21 branch January 9, 2025 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Nextclade workflow to use augur merge/rename
3 participants