-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling Download Errors #15
Conversation
|
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing work Eric. Thank you so much 😄
In addition to the comments below, could you also include an update to the README documentation here to describe this error report and when someone should expect to see it?
tests/workflows/fastq_download_prefetch_fasterqdump_sratools/main.nf.test
Show resolved
Hide resolved
tests/workflows/fastq_download_prefetch_fasterqdump_sratools/main.nf.test
Outdated
Show resolved
Hide resolved
tests/workflows/fastq_download_prefetch_fasterqdump_sratools/main.nf.test
Show resolved
Hide resolved
Updated the README: e31e41f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for all your changes. This looks great.
I do have one more comment. I'm wondering if we could add a test to make sure that if there is an error, the errored sample does not end up with references to any data in the iridanext.output.json.gz
file?
So, I'm wondering if you could add a sample that triggers an error to the samplesheet used in the end-to-end pipeline testing at https://github.com/phac-nml/fetchdatairidanext/blob/handle-errors/tests/data/samplesheet.csv
And then make sure there's no entry in the final iridanext.output.json.gz
file, which is already handled by comparison to the expected JSON file here: https://github.com/phac-nml/fetchdatairidanext/blob/handle-errors/tests/pipelines/fetchdatairidanext.nf.test#L19 (so there's no need for changes for this).
I was also trying to think if it's possible to have partially downloaded fastqs in the IRIDA Next JSON output during e.g., a network error, but I don't think that's possible. The only step that could have a network error is prefetch
, and that downloads a *.sra
file, which is converted to fastq in the fasterq-dump
step. Errors in the fasterq-dump
step will still trigger an error with the full pipeline, so it's not possible for the ignored error in prefetch
to ever lead to incomplete fastqs that wind up in IRIDA Next.
Does the above logic make sense to you?
I added another test as you suggested in ecca1f1 It uses the I also think there probably won't be partial reads in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. Thanks so much Eric 😄
Added the ability to handle errors when downloading data.
The pipeline will continue despite individual sample download errors.
The individual sample errors (if they exist) will be reported in
results/prefetch/failures_report.csv
.