Skip to content

Latest commit

 

History

History
41 lines (29 loc) · 2.33 KB

output.md

File metadata and controls

41 lines (29 loc) · 2.33 KB

ICGC-FeatureCounts

This pipeline takes some BAM files from ICGC and runs featureCounts on them.

This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. Documentation for the featureCounts part taken from nf-core/RNAseq, written by Rickard Hammarén and Phil Ewels.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  • featureCounts - gene counts, biotype counts, rRNA estimation.
  • MultiQC - aggregate report, describing results of the whole pipeline

featureCounts

featureCounts from the subread package summarises the read distribution over genomic features such as genes, exons, promotors, gene bodies, genomic bins and chromosomal locations. RNA reads should mostly overlap genes, so be assigned.

featureCounts

We also use featureCounts to count overlaps with different classes of features. This gives a good idea of where aligned reads are ending up and can show potential problems such as rRNA contamination. biotypes

Output directory: results/featureCounts

  • Sample.bam_biotype_counts.txt
    • Read counts for the different gene biotypes that featureCounts distinguishes.
  • Sample.featureCounts.txt
    • Read the counts for each gene provided in the reference gtf file
  • Sample.featureCounts.txt.summary
    • Summary file, containing statistics about the counts

MultiQC

MultiQC is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory.

Output directory: results/multiqc

  • Project_multiqc_report.html
    • MultiQC report - a standalone HTML file that can be viewed in your web browser
  • Project_multiqc_data/
    • Directory containing parsed statistics from the different tools used in the pipeline

For more information about how to use MultiQC reports, see http://multiqc.info