Batch snakemake

General

Aim

This pipeline is build to explore batch effects in single cell RNAseq data sets and simulate realistic batch effects from them. It also includes a comparison of the batch effects and other dataset features between simulation and real data.

Structure

The pipeline consists of 3 major steps

Characterize batch effects in real data
Simulate single cell data with a corresponding batch effect
Validate simulation using CountsimQC and batch characterization

Batch effects

Definition

As batch effect we consider all kinds of unwanted variation. Thus a batch effect is a signal caused by something that is not the biological signal of interest, but conflicts with this signal. So we need to adjust and/or understand the batch effect in order to use the full potential of your biological signal. In this definition a batch effect could be caused by patient differences or media differences in the one case, while in other cases this is the signal of interest. So it is very variable and always depends on the question asked.

Analysis

We analysed single cell RNAseq dataset with batch effects from different sources

Technical batch effects (e.g. different sequencing protocols)
Biological batch effects (e.g. different patients)
Conditional batch effects (e.g. different media)

Results

View results here.

Setup

Preparations

To setup this pipeline follow these instructions (Step 1 -2 explain one possible way to setup and run snakemake):

Set up and activate an Anaconda enviroment with Snakemake >= v.5.6.0 (or sth. eqivalent)
Make sure your path to R is exported within snakemake

e.g. adding *export PATH="/your/prefered/R/bin:$PATH"* in your *~/.bashrc*

Install all required R packages using packrat
Clone this repository

Caution: If you don't want to get all analysis that came with this repo you need to clean the docs directory from all files except _site.yaml

Create **log** and **out** directories.
Run: *snakemake dir_setup* to set up the neccessary directory structure to make all rules work.
If you want to view or share your analysis as website, activate github pages within your corresponding repo and specify the */docs* as source directory.

Run

To run the entire pipeline:

Copy your preprocessed *SingleCellExperiment* dataset into */src/data/*
Generate a corresponding metadata file and save it at */src/meta_files/*
Run snakemake
Push results to github and refresh it's web deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
src		src
README.md		README.md
Rplots.pdf		Rplots.pdf
config.yaml		config.yaml
snakefile		snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Batch snakemake

General

Aim

Structure

Batch effects

Definition

Analysis

Results

Setup

Preparations

Run

About

Releases 1

Packages

Languages

almutlue/batch_snakemake

Folders and files

Latest commit

History

Repository files navigation

Batch snakemake

General

Aim

Structure

Batch effects

Definition

Analysis

Results

Setup

Preparations

Run

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages