Skip to content

Commit

Permalink
Merge pull request #5 from phac-nml/limitations/fasterq
Browse files Browse the repository at this point in the history
Limitations/fasterq
  • Loading branch information
apetkau authored Jan 25, 2024
2 parents 7e0732a + 429470c commit 8131a08
Show file tree
Hide file tree
Showing 8 changed files with 44 additions and 24 deletions.
11 changes: 4 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,10 @@ Where the `samplesheet.csv` is structured as specified in the [Input](#input) se

## Read data

The sequence reads will appear in the `results/sratools/reads` directory (assuming `--outdir results` is specified). For example:
The sequence reads will appear in the `results/reads` directory (assuming `--outdir results` is specified). For example:

```
results/sratools/reads/
results/reads/
├── ERR1109373.fastq.gz
├── ERR1109373_1.fastq.gz
├── ERR1109373_2.fastq.gz
Expand All @@ -72,16 +72,13 @@ A JSON file for loading the data into IRIDA Next is output by this pipeline. The
"files": {
"global": [],
"samples": {
"SampleA": [
{ "path": "sratools/reads/SRR13191702_1.fastq.gz" },
{ "path": "sratools/reads/SRR13191702_2.fastq.gz" }
]
"SampleA": [{ "path": "reads/SRR13191702_1.fastq.gz" }, { "path": "reads/SRR13191702_2.fastq.gz" }]
}
}
}
```

Within the `files` section of this JSON file, all of the output paths are relative to the `--outdir results`. Therefore, `"path": "sratools/reads/SRR13191702_1.fastq.gz"` refers to a file located within `results/sratools/reads/SRR13191702_1.fastq.gz`.
Within the `files` section of this JSON file, all of the output paths are relative to the `--outdir results`. Therefore, `"path": "reads/SRR13191702_1.fastq.gz"` refers to a file located within `sratools/reads/SRR13191702_1.fastq.gz`.

An additional example of this file can be found at [tests/data/test1_iridanext.output.json](tests/data/test1_iridanext.output.json).

Expand Down
2 changes: 1 addition & 1 deletion conf/iridanext.config
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ iridanext {
validate = true
files {
idkey = "id"
samples = ["**/sratools/reads/*.fastq.gz"]
samples = ["**/reads/*.fastq.gz"]
}
}
}
12 changes: 12 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,16 @@ process {
pattern: '*_versions.yml'
]
}

withName: SRATOOLS_PREFETCH {
maxForks = params.max_jobs_with_network_connections
}

withName: SRATOOLS_FASTERQDUMP {
publishDir = [
path: { "${params.outdir}" },
mode: params.publish_dir_mode,
pattern: 'reads/*.fastq.gz'
]
}
}
5 changes: 3 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

- `sratools`: Data from the SRA tools step (downloading sequence reads).
- `sratools/reads`: The fastq files of downloaded reads.
- `reads`: The fastq files of downloaded reads.
- `pipeline_info`: information about the pipeline's execution
- `custom`: information on detected/generated NCBI settings used for accessing certain databases (see <https://nf-co.re/modules/custom_sratoolsncbisettings>).

Expand All @@ -28,7 +28,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- `sratools/`
- Sequence data in SRA format: `INSDC_ACCESSION/INSDC_ACCESSION.sra`
- Reads in fastq format: `reads/INSDC_ACCESSION.fastq.gz`
- `reads/`
- Reads in fastq format: `INSDC_ACCESSION.fastq.gz`

</details>

Expand Down
13 changes: 8 additions & 5 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,14 @@ params {
max_time = '1.h'

// Schema validation default options
validationFailUnrecognisedParams = false
validationLenientMode = false
validationSchemaIgnoreParams = 'genomes,igenomes_base'
validationShowHiddenParams = false
validate_params = true
validationFailUnrecognisedParams = false
validationLenientMode = false
validationSchemaIgnoreParams = 'genomes,igenomes_base'
validationShowHiddenParams = false
validate_params = true

// Options for limiting network activity
max_jobs_with_network_connections = 1
}

// Load base.config by default for all pipelines
Expand Down
7 changes: 7 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,13 @@
"description": "Validation of parameters in lenient more.",
"hidden": true,
"help_text": "Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode)."
},
"max_jobs_with_network_connections": {
"type": "integer",
"default": 1,
"minimum": 1,
"description": "Maximum number of jobs with network connections allowed to run at once",
"hidden": true
}
}
}
Expand Down
10 changes: 5 additions & 5 deletions tests/data/test1_iridanext.output.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"samples": {
"SAMPLE2": [
{
"path": "sratools/reads/SRR13191702_2.fastq.gz"
"path": "reads/SRR13191702_2.fastq.gz"
},
{
"path": "sratools/reads/SRR13191702_1.fastq.gz"
"path": "reads/SRR13191702_1.fastq.gz"
}
],
"SAMPLE1": [
{
"path": "sratools/reads/ERR1109373_2.fastq.gz"
"path": "reads/ERR1109373_2.fastq.gz"
},
{
"path": "sratools/reads/ERR1109373_1.fastq.gz"
"path": "reads/ERR1109373_1.fastq.gz"
},
{
"path": "sratools/reads/ERR1109373.fastq.gz"
"path": "reads/ERR1109373.fastq.gz"
}
]
}
Expand Down
8 changes: 4 additions & 4 deletions tests/pipelines/fetchdatairidanext.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ nextflow_pipeline {
assert path("$launchDir/test1_out/iridanext.output.json").json == path("$baseDir/tests/data/test1_iridanext.output.json").json

// Output data
assert path("$launchDir/test1_out/sratools/reads/ERR1109373_1.fastq.gz").linesGzip.size() == 512
assert path("$launchDir/test1_out/sratools/reads/ERR1109373_2.fastq.gz").linesGzip.size() == 512
assert path("$launchDir/test1_out/sratools/reads/SRR13191702_1.fastq.gz").linesGzip.size() == 364
assert path("$launchDir/test1_out/sratools/reads/SRR13191702_2.fastq.gz").linesGzip.size() == 364
assert path("$launchDir/test1_out/reads/ERR1109373_1.fastq.gz").linesGzip.size() == 512
assert path("$launchDir/test1_out/reads/ERR1109373_2.fastq.gz").linesGzip.size() == 512
assert path("$launchDir/test1_out/reads/SRR13191702_1.fastq.gz").linesGzip.size() == 364
assert path("$launchDir/test1_out/reads/SRR13191702_2.fastq.gz").linesGzip.size() == 364
}
}
}

0 comments on commit 8131a08

Please sign in to comment.