-
Notifications
You must be signed in to change notification settings - Fork 403
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Move (nextstrain) alignment into separate profiles
This breaks each of our main nextstrain profiles into two profiles each - one to perform alignment (etc) and a second to run the phylogenetics. This is part of a wider effort to generate alignments as soon as new sequences are available. Using GISAID as an example (also implemented for open, read the following but replace `gisaid` with `open` ): The first profile `nextstrain_profiles/nextstrain-gisaid-preprocess` takes sequences and metadata and produces three files which we upload to S3, `results/filtered_gisaid.fasta.xz`, `results/masked_gisaid.fasta.xz` and `results/aligned_gisaid.fasta.xz`. While this is a separate profile, the `builds.yaml` file is around a dozen lines of code. I chose this approach rather than using another `--configfile` in the existing profile as the way Snakemake overlays configs (or doesn't) isn't intuitive, and stubby profiles are easier to reason with. The second profile, `nextstrain_profiles/nextstrain-gisaid` is relatively unchanged, except that we now start from `results/filtered_gisaid.fasta.xz` and thus should be much faster to run. Note that the `upload` rule here will no longer upload the files which are now within the previous profile. It is still possible to start this workflow from (unaligned) sequences, however there should be no reason to do so. The sets of uploaded files are defined by config["upload"]. This allows for a profile to be created which uploads both preprocessing files and build files, if desired. An introduction to the different profiles, and exact commands to run each profile have been added to docs/dev_docs.md. Note that both profiles will run `rule sanitize_metadata`, as each depend on its output which is not uploaded as an intermediate file. None of these changes should affect non-nextstrain-core builds / profiles.
- Loading branch information
1 parent
878fa8e
commit c4934d9
Showing
10 changed files
with
160 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
22 changes: 22 additions & 0 deletions
22
nextstrain_profiles/nextstrain-gisaid-preprocess/builds.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
custom_rules: | ||
- workflow/snakemake_rules/export_for_nextstrain.smk | ||
|
||
# These parameters are only used by the `export_for_nextstrain` rule and shouldn't need to be modified. | ||
# To modify the s3 _source_ bucket, specify this directly in the `inputs` section of the config. | ||
# P.S. These are intentionally set as top-level keys as this allows command-line overrides. | ||
S3_DST_BUCKET: "nextstrain-ncov-private" | ||
S3_DST_COMPRESSION: "xz" | ||
S3_DST_ORIGINS: ["gisaid"] | ||
|
||
upload: | ||
- preprocessing-files | ||
|
||
inputs: | ||
- name: gisaid | ||
metadata: "s3://nextstrain-ncov-private/metadata.tsv.gz" | ||
sequences: "s3://nextstrain-ncov-private/sequences.fasta.xz" | ||
|
||
# Deploy and Slack options are related to Nextstrain live builds and don't need to be modified for local builds | ||
deploy_url: s3://nextstrain-data | ||
slack_token: ~ | ||
slack_channel: "#ncov-gisaid-updates" |
10 changes: 10 additions & 0 deletions
10
nextstrain_profiles/nextstrain-gisaid-preprocess/config.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
configfile: | ||
- defaults/parameters.yaml | ||
- nextstrain_profiles/nextstrain-gisaid-preprocess/builds.yaml | ||
|
||
keep-going: True | ||
printshellcmds: True | ||
show-failed-logs: True | ||
restart-times: 1 | ||
reason: True | ||
stats: stats.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
22 changes: 22 additions & 0 deletions
22
nextstrain_profiles/nextstrain-open-preprocess/builds.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
custom_rules: | ||
- workflow/snakemake_rules/export_for_nextstrain.smk | ||
|
||
# These parameters are only used by the `export_for_nextstrain` rule and shouldn't need to be modified. | ||
# To modify the s3 _source_ bucket, specify this directly in the `inputs` section of the config. | ||
# P.S. These are intentionally set as top-level keys as this allows command-line overrides. | ||
S3_DST_BUCKET: "nextstrain-data/files/ncov/open" | ||
S3_DST_COMPRESSION: "xz" | ||
S3_DST_ORIGINS: ["open"] | ||
|
||
upload: | ||
- preprocessing-files | ||
|
||
inputs: | ||
- name: open | ||
metadata: "s3://nextstrain-data/files/ncov/open/metadata.tsv.gz" | ||
sequences: "s3://nextstrain-data/files/ncov/open/sequences.fasta.xz" | ||
|
||
# Deploy and Slack options are related to Nextstrain live builds and don't need to be modified for local builds | ||
deploy_url: s3://nextstrain-data | ||
slack_token: ~ | ||
slack_channel: "#ncov-genbank-updates" |
10 changes: 10 additions & 0 deletions
10
nextstrain_profiles/nextstrain-open-preprocess/config.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
configfile: | ||
- defaults/parameters.yaml | ||
- nextstrain_profiles/nextstrain-open-preprocess/builds.yaml | ||
|
||
keep-going: True | ||
printshellcmds: True | ||
show-failed-logs: True | ||
restart-times: 1 | ||
reason: True | ||
stats: stats.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters