Skip to content

changxinw/pubRNAseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyRNAseq

Public RNAseq data analysis pipline based on python2 and R. The basic workflow is get the sra file based on GSM ID. Those sra file can be transfered to fastq with fastq-dump from sratoolkit. The software salmon can quantify based on refseq ID. Differential expression analysis is implemented by DESeq2.

Requirements

  • python >= 2.6 and <=2.8
  • R with DESeq2 installed
  • salmon

Installation and setup of pyRNAseq

git clone https://github.com/WChangson/pyRNAseq.git
cd pyRNAseq/source/

You can see a config.json here.

{
    "fastq-dump": "fastq-dump",
    "salmon": "/data5/home/changxin/miniconda2/envs/salmon/bin/salmon",
    "r": "/data5/home/changxin/R-3.4.4/bin/Rscript",
    "hg38": "/data5/home/changxin/genome_index/salmon_refseq_hg38",
    "mm10": "/data5/home/changxin/genome_index/salmon_refseq_mm10"
}

Open the file and set up the path of fastq-dump in the first line, the path of salmon in the second line, and the path of R script you want to use in the third line. If you install those software globally, your can keep them as fastq-dump, salmon, Rscript. The fourth and fifth line is for the salmon index of hg38 and mm10.

cd ..
python setup.py install

Usage of pyRNAseq

pyRNAseq -h

This will list all the arguments of pyRNAseq

Options

-d/--dmat

The design matrix which should include two columns and delaminated by tab. The first column is GSM IDs of RNAseq samples. The second column is the design of experiment. Usually control group shoud be upper and treatment group should be at the bottom. For example:

GSM10001	Control    
GSM10002	Control    
GSM10003	Treatment  
GSM10004	Treatment  
-o/--output

Output directory of pyRNAseq results, including fastq files, read count files, and differential expression file. If this file do not exists, pyRNAseq will make it atomatically.

-s/--species [hg38, mm10]

We only support hg38 or mm10 for salmon's pseudo-alignment.

-sal/--salmon

Whether to run salmon.

-de/--difexpr

Whether to do differential expression analysis with DESeq2.

About

A python RNAseq analysis pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published