This Nextflow pipeline pulls samples from iRODS and converts them to FASTQ files.
main.nf
- the Nextflow pipeline that runs all workflowsmodules/metatable.nf
- a collection of processes that help gettingIRODS
metadata for samples listed in--findmeta <samples.csv>
filemodules/getfiles.nf
- a collection of processes that help loading the data (.cram
or.bam
files) from IRODS and converting them to.fastq.gz
filesmodules/upload2ftp.nf
- a collection of processes that help uploading a list of.fastq.gz
files to FTP server (specified innextflow.config
)nextflow.config
- the configuration script that controls the cluster scheduler, process and containerbin/parser.py
- script that parses metadata fromimeta ls
output and saves it in.json
formatbin/combine_meta.py
- script that combines all metadata in.json
format and saves it to.tsv
fileexamples/samples.csv
- an example samples.csv file, contains one colum with sample names (no header is required)examples/run.sh
- an example run script that executes the pipeline.
--findmeta
: specify a .csv file with sample names to run a metadata search--cram2fastq
: if specified the script runs conversion of cram files that are found onfindmeta
step--meta
: this argument spicifies the .tsv with cram files (potentially fromfindmeta
step) to run cram2fastq conversion--publish_dir
: path to put the output filess of the pipeline. (default'results'
)--index_format
: index-format formula for samtools, only if you really know what you're doing (default"i*i*"
)--toftp
: upload the resulting files to the ArrayExpress FTP server (defaultfalse
).- Use in combination with
--ftp_credenials
,--ftp_host
and--ftp_path
- Use in combination with
--fastqfiles
: this argument spicifies the .fastq.gz files (potentially fromcram2fastq
step) to upload them to the ArrayExpress ftp server
To run this pipeline you need to have enabled:
- IRODS
- Python
- Nextflow of version
24.10.0
or higher - Singularity
You can enable them on farm22
with the following commands:
module load cellgen/nextflow/24.10.0
module load cellgen/irods
module load cellgen/singularity
module load python-3.11.6
Additionally you need to set your LSF group:
export LSB_DEFAULT_USERGROUP=<YOURGROUP>
- Run a metadata search for a specified list of samples:
nextflow run main.nf --findmeta ./examples/samples.csv
- Download cram files (specified in metadata.tsv) from IRODS and convert them to fastq
nextflow run main.nf --cram2fastq --meta metadata/metadata.tsv
- Upload fastq files to ftp server (you to set up the ftp server in nextflow.config):
nextflow run main.nf --toftp --fastqfiles ./results/
- Combine several steps to run them together
nextflow run main.nf --findmeta ./examples/samples.csv --cram2fastq --toftp
---
title: Nextflow pipeline for retrieving CRAM files stored in IRODS and convert them to FASTQ
---
flowchart TB
subgraph findmeta["Find CRAM metadata"]
direction LR
v0([findCrams])
v1([getMetadata])
v2([parseMetadata])
v3([combineMetadata])
end
subgraph downloadcrams["Covert CRAMS --> FASTQ"]
direction LR
v4([downloadCram])
v5([cramToFastq])
v6([calculateReadLength])
v7([checkATAC])
v8([renameATAC])
v9([saveMetaToJson])
v10([updateMetadata])
end
subgraph uploadtoftp["Upload data to FTP"]
direction LR
v11([concatFastqs])
v12([uploadFTP])
end
v0 --> v1 --> v2 --> v3
v4 --> v5 --> v6 --> v7{10X ATAC}
v11 --> v12
v7 --YES--> v8
v8 --> v9
v7 --NO--> v9
v9 --> v10
findmeta -.-> downloadcrams -.-> uploadtoftp