the intent is to automate some pre-processing steps required for dot-calling and allow automated way to call dots for multiple samples.
the pipeline is "self-sufficient", i.e. one can start from unbalanced coolers, or try to optimize balancing for better results (cis-only
, ignore-diags 1
, etc.). Computation of expected is included in the pipeline as well.
The pipeline will run using the latest stable versions of cooler
and cooltools
along with the https://github.com/sergpolly/peaktools required for merging step (would be eventually included in the cooltools
).
call-dots
is using fixed size of the "donut"(and other convolution kernels) to calculate local enrichment around each pixelcall-dots
is surveying only limited range of genomic separations, e.g. between 0 and 10MB
in our experience limitation (2) does not affect typical dot-calls for human cell-lines (GM, HFF, ESC, etc), whereas limitation (1) prevents us from calling some "small" dots (near the diagonal on a Hi-C heatmap), e.g. at 5kb resolution dot-calling "starts" at ~75kb. "shirking-donuts" of the GPU version of HiCCUPS allows for as small as ~50kb dots to be caled at 5kb resolution.
- re-balance input coolers at 5kb and 10kb, saving weights into
wsnake
column - compute expected at 5kb and 10kb using re-balanced coolers, and removing
chrY
andchrM
from the output - call-dots at 5kb and 10kb using re-balanced coolers and computed expected
- merge dots called at 5kb and 10kb into a combined list of called dots.
- install:
- cooler
- cooltools
- peaktools
- snakemake
there are plenty of instuctions on how to do it, using conda
, pip
, etc.
For peaktools
one can do following pip install git+https://github.com/sergpolly/peaktools.git
- prepare
project.yml
that contains your input cooler-names and their corresponding locations, e.g.:
samples:
- sample1.mcool
- sample2.mcool
location:
- /path/to/sample1
- /path/to/sample2
-
clone this repo, tweak the
Snakemake
-file to adjust your balancing options, expected calculations, and dot-calling - unfortunatelly there is no easy clean interface for providing such parameters outside of theSnakefile
for now -
Run the pipeline using
snakemake
:
- one can run the entire pipeline from coolers to dot-calls, locally on a ~6+ core, 16GB+ RAM computer:
snakemake -j NUMBER_OF_CORES ---configfile /path/to/your/project.yml
- run it on the cluster! - we provide an example for LSF batch submission system:
snakemake -j MAX_NUMBER_OF_JOBS --configfile /path/to/your/project.yml --printshellcmds --cluster-config cluster.json -- cluster \"bsub -q {cluster.queue} -W {cluster.time} -n {cluster.nCPUs} -R {cluster.memory} -R {cluster.resources} -oo {cluster.output} -eo {cluster.error} -J {cluster.name}\"
where cluster.PARAMETER
parameters are provided in the cluster.json
file.
- alternatively - one can simply run each individual command provided in the
Snakefile
simply using it for guidance and typical parameters.
- there are several
project
files that highlight what has been processed for the microC publication. - the
downsampled
project is related to the downsampled microC samples that are matching number of cis-interactions with the corresponding HiC maps. launch.sh
is a bash "script" to run the pipeline on an LSF cluster, i.e.bash launch.sh my_new.project.yml