Public RNAseq data analysis pipline based on python2 and R. The basic workflow is get the sra file based on GSM ID. Those sra file can be transfered to fastq with fastq-dump from sratoolkit. The software salmon can quantify based on refseq ID. Differential expression analysis is implemented by DESeq2.
- python >= 2.6 and <=2.8
- R with DESeq2 installed
- salmon
git clone https://github.com/WChangson/pyRNAseq.git
cd pyRNAseq/source/
You can see a config.json here.
{
"fastq-dump": "fastq-dump",
"salmon": "/data5/home/changxin/miniconda2/envs/salmon/bin/salmon",
"r": "/data5/home/changxin/R-3.4.4/bin/Rscript",
"hg38": "/data5/home/changxin/genome_index/salmon_refseq_hg38",
"mm10": "/data5/home/changxin/genome_index/salmon_refseq_mm10"
}
Open the file and set up the path of fastq-dump in the first line, the path of salmon in the second line, and the path of R script you want to use in the third line. If you install those software globally, your can keep them as fastq-dump, salmon, Rscript. The fourth and fifth line is for the salmon index of hg38 and mm10.
cd ..
python setup.py install
pyRNAseq -h
This will list all the arguments of pyRNAseq
The design matrix which should include two columns and delaminated by tab. The first column is GSM IDs of RNAseq samples. The second column is the design of experiment. Usually control group shoud be upper and treatment group should be at the bottom. For example:
GSM10001 Control
GSM10002 Control
GSM10003 Treatment
GSM10004 Treatment
Output directory of pyRNAseq results, including fastq files, read count files, and differential expression file. If this file do not exists, pyRNAseq will make it atomatically.
We only support hg38 or mm10 for salmon's pseudo-alignment.
Whether to run salmon.
Whether to do differential expression analysis with DESeq2.