This pipeline is build to explore batch effects in single cell RNAseq data sets and simulate realistic batch effects from them. It also includes a comparison of the batch effects and other dataset features between simulation and real data.
The pipeline consists of 3 major steps
- Characterize batch effects in real data
- Simulate single cell data with a corresponding batch effect
- Validate simulation using CountsimQC and batch characterization
As batch effect we consider all kinds of unwanted variation. Thus a batch effect is a signal caused by something that is not the biological signal of interest, but conflicts with this signal. So we need to adjust and/or understand the batch effect in order to use the full potential of your biological signal. In this definition a batch effect could be caused by patient differences or media differences in the one case, while in other cases this is the signal of interest. So it is very variable and always depends on the question asked.
We analysed single cell RNAseq dataset with batch effects from different sources
- Technical batch effects (e.g. different sequencing protocols)
- Biological batch effects (e.g. different patients)
- Conditional batch effects (e.g. different media)
View results here.
To setup this pipeline follow these instructions (Step 1 -2 explain one possible way to setup and run snakemake):
- Set up and activate an Anaconda enviroment with Snakemake >= v.5.6.0 (or sth. eqivalent)
- Make sure your path to R is exported within snakemake
- e.g. adding
*export PATH="/your/prefered/R/bin:$PATH"*
in your*~/.bashrc*
- Install all required R packages using packrat
- Clone this repository
- Caution: If you don't want to get all analysis that came with this repo you need to clean the
docs
directory from all files except_site.yaml
- Create
**log**
and**out**
directories. - Run:
*snakemake dir_setup*
to set up the neccessary directory structure to make all rules work. - If you want to view or share your analysis as website, activate github pages within your corresponding repo and specify the
*/docs*
as source directory.
To run the entire pipeline:
- Copy your preprocessed
*SingleCellExperiment*
dataset into*/src/data/*
- Generate a corresponding metadata file and save it at
*/src/meta_files/*
- Run snakemake
- Push results to github and refresh it's web deployment.