Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Download Errors #15

Merged
merged 23 commits into from
Apr 10, 2024
Merged

Handling Download Errors #15

merged 23 commits into from
Apr 10, 2024

Conversation

emarinier
Copy link
Member

Added the ability to handle errors when downloading data.

The pipeline will continue despite individual sample download errors.
The individual sample errors (if they exist) will be reported in results/prefetch/failures_report.csv.

Copy link

github-actions bot commented Apr 8, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 36cec88

+| ✅ 118 tests passed       |+
#| ❔  31 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 1.1.0
  • schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/fetchdatairidanext/master/nextflow_schema.json
    Found https://raw.githubusercontent.com/phac-nml/fetchdatairidanext/main/nextflow_schema.json

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-fetchdatairidanext_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-fetchdatairidanext_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-fetchdatairidanext_logo_dark.png
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: lib/Utils.groovy
  • files_exist - File is ignored: lib/WorkflowMain.groovy
  • files_exist - File is ignored: lib/NfcoreTemplate.groovy
  • files_exist - File is ignored: lib/WorkflowFetchdatairidanext.groovy
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File does not exist: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-fetchdatairidanext_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-fetchdatairidanext_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-fetchdatairidanext_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/fetchdatairidanext/fetchdatairidanext/.github/workflows/awstest.yml
  • actions_awsfulltest - actions_awsfulltest
  • pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

Run details

  • nf-core/tools version 2.13.1
  • Run at 2024-04-10 15:17:54

This comment was marked as resolved.

@emarinier emarinier requested a review from apetkau April 8, 2024 20:56
Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing work Eric. Thank you so much 😄

In addition to the comments below, could you also include an update to the README documentation here to describe this error report and when someone should expect to see it?

@emarinier
Copy link
Member Author

This is amazing work Eric. Thank you so much 😄

In addition to the comments below, could you also include an update to the README documentation here to describe this error report and when someone should expect to see it?

Updated the README: e31e41f

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for all your changes. This looks great.

I do have one more comment. I'm wondering if we could add a test to make sure that if there is an error, the errored sample does not end up with references to any data in the iridanext.output.json.gz file?

So, I'm wondering if you could add a sample that triggers an error to the samplesheet used in the end-to-end pipeline testing at https://github.com/phac-nml/fetchdatairidanext/blob/handle-errors/tests/data/samplesheet.csv

And then make sure there's no entry in the final iridanext.output.json.gz file, which is already handled by comparison to the expected JSON file here: https://github.com/phac-nml/fetchdatairidanext/blob/handle-errors/tests/pipelines/fetchdatairidanext.nf.test#L19 (so there's no need for changes for this).

I was also trying to think if it's possible to have partially downloaded fastqs in the IRIDA Next JSON output during e.g., a network error, but I don't think that's possible. The only step that could have a network error is prefetch, and that downloads a *.sra file, which is converted to fastq in the fasterq-dump step. Errors in the fasterq-dump step will still trigger an error with the full pipeline, so it's not possible for the ignored error in prefetch to ever lead to incomplete fastqs that wind up in IRIDA Next.

Does the above logic make sense to you?

@emarinier
Copy link
Member Author

I added another test as you suggested in ecca1f1

It uses the errorsheet.csv and does many similar checks as the other integration test.

I also think there probably won't be partial reads in reads/ because it should fail in prefetch, as you suggested above.

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Thanks so much Eric 😄

@emarinier emarinier merged commit 03e156a into dev Apr 10, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants