nf-irods-to-fastq

This Nextflow pipeline pulls samples from iRODS and converts them to FASTQ files.

Contents of Repo

main.nf - the Nextflow pipeline that runs all workflows
modules/metatable.nf - a collection of processes that help getting IRODS metadata for samples listed in --findmeta <samples.csv> file
modules/getfiles.nf - a collection of processes that help loading the data (.cram or .bam files) from IRODS and converting them to .fastq.gz files
modules/upload2ftp.nf - a collection of processes that help uploading a list of .fastq.gz files to FTP server (specified in nextflow.config)
nextflow.config - the configuration script that controls the cluster scheduler, process and container
bin/parser.py - script that parses metadata from imeta ls output and saves it in .json format
bin/combine_meta.py - script that combines all metadata in .json format and saves it to .tsv file
examples/samples.csv - an example samples.csv file, contains one colum with sample names (no header is required)
examples/run.sh - an example run script that executes the pipeline.

Pipeline Arguments

--findmeta: specify a .csv file with sample names to run a metadata search
--cram2fastq: if specified the script runs conversion of cram files that are found on findmeta step
--meta: this argument spicifies the .tsv with cram files (potentially from findmeta step) to run cram2fastq conversion
--publish_dir: path to put the output filess of the pipeline. (default 'results')
--index_format: index-format formula for samtools, only if you really know what you're doing (default "i*i*")
--toftp: upload the resulting files to the ArrayExpress FTP server (default false).
- Use in combination with --ftp_credenials, --ftp_host and --ftp_path
--fastqfiles: this argument spicifies the .fastq.gz files (potentially from cram2fastq step) to upload them to the ArrayExpress ftp server

Examples of use

Prepare your environment

To run this pipeline you need to have enabled:

IRODS
Python
Nextflow of version 24.10.0 or higher
Singularity

You can enable them on farm22 with the following commands:

module load cellgen/nextflow/24.10.0
module load cellgen/irods
module load cellgen/singularity
module load python-3.11.6

Additionally you need to set your LSF group:

export LSB_DEFAULT_USERGROUP=<YOURGROUP>

Run nextflow command

Run a metadata search for a specified list of samples:

nextflow run main.nf --findmeta ./examples/samples.csv

Download cram files (specified in metadata.tsv) from IRODS and convert them to fastq

nextflow run main.nf --cram2fastq --meta metadata/metadata.tsv

Upload fastq files to ftp server (you to set up the ftp server in nextflow.config):

nextflow run main.nf --toftp --fastqfiles ./results/

Combine several steps to run them together

nextflow run main.nf --findmeta ./examples/samples.csv --cram2fastq --toftp

Graph

---
title: Nextflow pipeline for retrieving CRAM files stored in IRODS and convert them to FASTQ
---
flowchart TB
    subgraph findmeta["Find CRAM metadata"]
    direction LR
      v0([findCrams])
      v1([getMetadata])
      v2([parseMetadata])
      v3([combineMetadata])
    end
    subgraph downloadcrams["Covert CRAMS --> FASTQ"]
    direction LR
        v4([downloadCram])
        v5([cramToFastq])
        v6([calculateReadLength])
        v7([checkATAC])
        v8([renameATAC])
        v9([saveMetaToJson])
        v10([updateMetadata])
    end
    subgraph uploadtoftp["Upload data to FTP"]
    direction LR
      v11([concatFastqs])
      v12([uploadFTP])
    end
    v0 --> v1 --> v2 --> v3
    v4 --> v5 --> v6 --> v7{10X ATAC}
    v11 --> v12
    v7 --YES--> v8
    v8 --> v9
    v7 --NO--> v9
    v9 --> v10
    findmeta -.-> downloadcrams -.-> uploadtoftp

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
bin		bin
docker		docker
examples		examples
modules		modules
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nf-irods-to-fastq

Contents of Repo

Pipeline Arguments

Examples of use

Prepare your environment

Run nextflow command

Graph

About

Releases 2

Packages

Contributors 5

Languages

License

cellgeni/nf-irods-to-fastq

Folders and files

Latest commit

History

Repository files navigation

nf-irods-to-fastq

Contents of Repo

Pipeline Arguments

Examples of use

Prepare your environment

Run nextflow command

Graph

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 5

Languages

Packages