Nextflow is a workflow manager that allows the creation of portable, scalable, reproducible pipelines.
Nextflow gives the possibility to separate the configuration and the logic of the pipeline in different files.
Nextflow has a very strong support for different execution systems, so that you can test your pipeline locally, but then deploy it in your cluster (using the installed scheduler, such as Slurm or PBS), or to the cloud!
A simple workflow to assemble prokaryotic genomes, annotate them with Prokka, and gathering statistics to prepare a MultiQC report.
This typical workflow processes a set of multiple samples (filtering, assemblying, annotating), while for other steps will collect multiple outputs to produce a summary (QUAST for assemblys statistics, MultiQC report).
Blue blocks are executed once per sample, while gray dataset are collector, and are executed once per project.
graph TD;
style input fill:#ff9,stroke:#333,stroke-width:2px
classDef collapse fill:#EEE,stroke:#333,stroke-width:2px
classDef multi fill:#9FF,stroke:#333,troke-width:2px
input(FASTQ INPUT) --> SUBSAMPLE:::multi;
SUBSAMPLE --> FASTP:::multi;
FASTP --> ASSEMBLY:::multi;
ASSEMBLY --> QUAST:::collapse;
ASSEMBLY --> PROKKA:::multi;
ASSEMBLY --> ABRICATE:::multi;
ASSEMBLY --> MLST:::multi;
ABRICATE --> SUMMARY:::collapse;
SUMMARY --> MULTIQC:::collapse;
MLST --> MULTIQC:::collapse;
QUAST --> MULTIQC;
PROKKA --> MULTIQC;
- Learning Nextflow in 2022, blog post by Evan Floden & Alain Coletta
- Nextflow tutorial (Carpentries)
- nf-core community, a set of high quality bioinformatics pipeline backed by a fantastic community: the pipelines, their YouTube channel and their Slack channels are all worth checking
- Video: introduction to MultiQC, MultiQC is a great tool that makes any pipeline better, and Nextflow can make great use of it