Skip to content

Latest commit

 

History

History
146 lines (120 loc) · 7.87 KB

README.md

File metadata and controls

146 lines (120 loc) · 7.87 KB

Sparse Signaling Pathway Sampling

Test SSPS DOI

Code related to the manuscript Inferring signaling pathways with probabilistic programming (Merrell & Gitter, 2020) Bioinformatics, 36:Supplement_2, i822–i830.

This repository contains the following:

  • SSPS: A method that infers relationships between variables using time series data.
    • Modeling assumption: the time series data is generated by a Dynamic Bayesian Network (DBN).
    • Inference strategy: MCMC sampling over possible DBN structures.
    • Implementation: written in Julia, using the Gen probabilistic programming language
  • Analysis code:
    • simulation studies;
    • convergence analyses;
    • evaluation on experimental data;
    • a Snakefile for managing all of the analyses.

Installation and basic setup

(If you plan to reproduce all of the analyses, then make sure you're on a host with access to plenty of CPUs. Ideally, you would have access to a cluster of some sort.)

  1. Clone this repository
git clone git@github.com:gitter-lab/ssps.git
  1. Install Julia 1.6 (and all Julia dependencies)
    $ wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.7-linux-x86_64.tar.gz 
    $ tar -xvzf julia-1.6.7-linux-x86_64.tar.gz
    
    $ cd ssps/SSPS
    $ julia --project=. 
                   _
       _       _ _(_)_     |  Documentation: https://docs.julialang.org
      (_)     | (_) (_)    |
       _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
      | | | | | | |/ _` |  |
      | | |_| | | | (_| |  |  Version 1.6.7 (2022-07-19)
     _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
    |__/                   |
    
    julia> using Pkg
    julia> Pkg.instantiate()
    julia> exit()
    

Reproducing the analyses

In order to reproduce the analyses, you will need some extra bits of software.

  • We use Snakemake -- a python package -- to manage the analysis workflow.
  • We use some other python packages to postprocess the results, produce plots, etc.
  • Some of the baseline methods are implemented in R or MATLAB.

Hence, the analyses entail some extra setup:

  1. Install python dependencies (using conda)

    • For the purposes of these instructions, we assume you have Anaconda3 or Miniconda3 installed, and have access to the conda environment manager.
      (We recommend using Miniconda; find full installation instructions here.)
    • We recommend setting up a dedicated virtual environment for this project. The following will create a new environment named ssps and install the required python packages:
    $ conda create -n ssps -c conda-forge pandas matplotlib numpy bioconda::snakemake-minimal
    $ conda activate ssps
    (ssps) $
    
    • If you plan to reproduce the analyses on a cluster, then install cookiecutter and the complete version of snakemake
    (ssps) $ conda install -c conda-forge cookiecutter bioconda::snakemake
    

    and find the appropriate Snakemake profile from this list: https://github.com/Snakemake-Profiles/doc install the Snakemake profile using cookiecutter:

    (ssps) $ cookiecutter https://github.com/Snakemake-Profiles/htcondor.git
    

    replacing the example with the desired profile.

  2. Install R packages

  3. Check whether MATLAB is installed.

After completing this additional setup, we are ready to run the analyses.

  1. Make any necessary modifications to the configuration file: analysis_config.yaml. This file controls the space of hyperparameters and datasets explored in the analyses.
  2. Run the analyses using snakemake:
    • If you're running the analyses on your local host, simply move to the directory containing Snakefile and call snakemake.
    (ssps) $ cd ssps
    (ssps) $ snakemake
    
    • Since Julia is a dynamically compiled language, some time will be devoted to compilation when you run SSPS for the first time. You may see some warnings in stdout -- this is normal.
    • If you're running the analyses on a cluster, call snakemake with the same Snakemake profile you found here:
    (ssps) $ cd ssps
    (ssps) $ snakemake --profile YOUR_PROFILE_NAME
    
    (You will probably need to edit the job submission parameters in the profile's config.yaml file.)
  3. Relax. It will take tens of thousands of cpu-hours to run all of the analyses.

Running SSPS on your data

Follow these steps to run SSPS on your dataset. You will need

  • a CSV file (tab separated) containing your time series data
  • a CSV file (comma separated) containing your prior edge confidences.
  • Optional: a JSON file containing a list of variable names (i.e., node names).
  1. Install the python dependencies if you haven't already. Find detailed instructions above.
  2. cd to the run_ssps directory
  3. Configure the parameters in ssps_config.yaml as appropriate
  4. Run Snakemake: $ snakemake --cores 1. Increase 1 to increase the maximum number of CPU cores to be used.

A note about parallelism

SSPS allows two levels of parallelism: (1) at the Markov chain level and (2) at the iteration level.

  • Chain-level parallelism is provided via Snakemake. For example, Snakemake can run 4 chains simultaneously if you specify --cores 4 at the command line: $ snakemake --cores 4. In essence, this just creates 4 instances of SSPS that run simultaneously.
  • Iteration-level parallelism is provided by Julia's multi-threading features. The number of threads available to a SSPS instance is specified by an environment variable: JULIA_NUM_THREADS.
  • The total number of CPUs used by your SSPS jobs is the product of Snakemake's --cores parameter and Julia's JULIA_NUM_THREADS environment variable. Concretely: if we run snakemake --cores 2 and have JULIA_NUM_THREADS=4, then up to 8 CPUs may be used at one time by the SSPS jobs.

Licenses

SSPS is available under the MIT License, Copyright © 2020 David Merrell.

The MATLAB code dynamic_network_inference.m has been modified from the original version, Copyright © 2012 Steven Hill and Sach Mukherjee.

The dream-challenge data is described in Hill et al., 2016 and is originally from Synapse.