nf-core/denovotranscript is a bioinformatics pipeline for de novo transcriptome assembly of paired-end short reads from bulk RNA-seq. It takes a samplesheet and FASTQ files as input, perfoms quality control (QC), trimming, assembly, redundancy reduction, pseudoalignment, and quantification. It outputs a transcriptome assembly FASTA file, a transcript abundance TSV file, and a MultiQC report with assembly quality and read QC metrics.
-
Read QC of raw reads (
FastQC
) -
Adapter and quality trimming (
fastp
) -
Read QC of trimmed reads (
FastQC
) -
Remove rRNA or mitochondrial DNA (optional) (
SortMeRNA
) -
Transcriptome assembly using any combination of the following:
-
Redundancy reduction with
Evidential Gene tr2aacds
. A transcript to gene mapping is produced from Evidential Gene's outputs usinggawk
. -
Assembly completeness QC (
BUSCO
) -
Other assembly quality metrics (
rnaQUAST
) -
Transcriptome quality assessment with
TransRate
, including the use of reads for assembly evaluation. This step is not performed if profile is set toconda
ormamba
. -
Pseudo-alignment and quantification (
Salmon
) -
HTML report for raw reads, trimmed reads, BUSCO, and Salmon (
MultiQC
)
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv
:
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
Each row represents a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run nf-core/denovotranscript \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file
option. Custom config files including those provided by the -c
Nextflow option can be used to provide any configuration except for parameters;
see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
nf-core/denovotranscript was written by Avani Bhojwani (@avani-bhojwani) and Timothy Little (@timslittle).
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #denovotranscript
channel (you can join with this invite).
If you use nf-core/denovotranscript for your analysis, please cite it using the following doi: 10.5281/zenodo.13324371
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.