This is a SnakeMake workflow to apply the protocol described in Pertea et al. 2016. With some modifications.
The idea is to produce Differential Expression analysis given a bunch of FASTQ files,
Follow the contents of the .travis.yml
file:
-
Install (ana|mini)conda
-
Installation
git clone https://github.com/jlanga/smsk_tuxedo2.git smsk_tuxedo2 cd smsk_tuxedo2 snakemake --use-conda --create-envs-only
-
Execute the test pipeline:
snakemake --use-conda -j
-
Modify the following files:
features.yml
with your reference genome and annotation,samples.tsv
with the paths and info of your samplessrc/config.yaml
-
Run the pipeline with your data:
snakemake --use-conda -j
The hierarchy of the folder is the one described in Good enough practices in scientific computing:
smsk
├── bin/: external scripts/binaries
├── data/: test data.
├── doc/: documentation.
├── README.md
├── results:
| ├── raw: links to your raw data.
| ├── map: files from HISAT2 mapping: index and CRAM files.
| ├── quant: files from StringTie assembly and quantification
| └── de: files from Ballgown: differential expression tables, RData objects for closer inspection.
├── Snakefile: driver script of the project.
├── environment.yml: packages to execute the analysis.
└── src: snakefiles, installers, config.yaml, R scripts.
-
Index is build from scratch
-
Exons and splicing sites are computed from the reference GTF file
-
Paired reads are mapped with HISAT2. Results are compressed to CRAM on the fly.
-
Using the exact parameters from Pertea et al. 2016
-
CRAM -> SAM conversion on the fly
-
Performing DE with the R script provided in
src/de_ballgown.R
-
Visualization should be done interactively.
-
Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Pertea et al 2016
-
The Sequence Alignment/Map format and SAMtools. Li et al.
-
HISAT: a fast spliced aligner with low memory requirements. Kim et al.
-
StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Pertea et al.
-
Flexible isoform-level differential expression analysis with Ballgown. Frazee et al.
-
SnakeMake - A scalable workflow engine. Köster et al.
-
smsk - a snakemake skeleton to jumpstart your projects. Langa.