STAMP: Toward subtask decomposition-based learning and benchmarking for genetic perturbation outcome prediction and beyond

Introduction

This repository hosts the official implementation of STAMP, a method that can predict perturbation outcomes using single-cell RNA-sequencing data from perturbational experimental screens using subtask decomposition learning. STAMP can be applied to three challenges in this area, i.e. (1) predict single genetic perturbation outcomes, (2) predict multiple genetic perturbation outcomes and (3) predict genetic perturbation outcomes across cell lines.

Installation

Our experiments were conducted on python=3.9.7 and our CUDA version is 11.4.

We recommend using Anaconda / Miniconda to create a conda environment for using STAMP. You can create a python environment using the following command:

conda  create -n stamp python==3.9.7

Then, you can activate the environment using:

conda activate stamp

Installing Pytorch with following command:

conda install pytorch==1.10.2 torchvision==0.11.3 torchaudio==0.10.2 -c pytorch

Then

pip install .

or you can install it from PyPI:

pip install cell-stamp

Example data

We have made available the code necessary to generate example data, serving as a practical illustration for training and testing the STAMP model. Additionally, for guidance on configuring the training process of STAMP, we offer an example config file located at ./Data/example_config.yaml.

python ./Data/GeneratingExampleData.py

The example *.h5ad data file has three distinct layers, namely 'level1', 'level2', and 'level3'. The 'level1' layer is a binary matrix, where '0' represents non-differentially expressed genes (non-DEGs) and '1' indicates differentially expressed genes (DEGs). Similarly, 'level2' is another binary matrix, denoting down-regulated genes with '0' and up-regulated genes with '1'. Lastly, the 'level3' layer is a matrix that quantifies the magnitude of gene expression changes.

Real demo data

We have uploaded all benchmark datasets to Zenodo, which can be obtained from here. Please download all these files into the ./Data directory and refer tutorial_for_training.py.ipynb in ./Tutorial directory. This tutorial uses the one fold of RPE1_essential dataset as an example to perform the model training, testing and check the loss curves during training.

Note: Users are encouraged to change the path of each data in 'Config.yaml' based on their own machines.

Core API interface for model training

Using this API, you can train and test STAMP on your own perturbation datasets using a few lines of code.

from stamp import STAMP, load_config
import scanpy as sc

# load config file
config = load_config("./Data/example_config.yaml")

# set up and train a STAMP
model = STAMP(config)
model.train()

# load trained model
model.load_pretrained(f"{config['Train']['output_dir']}/trained_models")

# use trained model to predict unseen perturbations
model.prediction(config['dataset']['Testing_dataset'], combo_test = True)

# use trained model to predict unseen perturbations; considering Top 40 DEGs
# Top 40 DEGs consisting of Top 20 up-regulation genes and Top 20 down-regulation genes

# load Top 40 test data
top_40_data = sc.read_h5ad("./Data/example_test_top40.h5ad")

# prediction
model.prediction(top_40_data, combo_test = True)

Core API interface for model fine-tuning

Using this API, you can fine-tune and test STAMP on your own perturbation datasets using a few lines of code.

from stamp import STAMP, load_config
import scanpy as sc

# load config file (we use the example config used for model training to illustrate this)
config = load_config("./Data/example_config.yaml")

# set up STAMP
model = STAMP(config)

# load pre-trained model
model.load_pretrained(f"{config['Train']['output_dir']}/trained_models")

# fine-tuning model
model.finetuning()

# use fine-tuned model to predict unseen perturbations
model.prediction(config['dataset']['Testing_dataset'], combo_test = False)

# use fine-tuned model to predict unseen perturbations; considering Top 40 DEGs
# Top 40 DEGs consisting of Top 20 up-regulation genes and Top 20 down-regulation genes

# load Top 40 test data
top_40_data = sc.read_h5ad("./Data/example_test_top40.h5ad")

# prediction
model.prediction(top_40_data, combo_test = False)

Citation

Yicheng Gao, Zhiting Wei, Qi Liu et al. Toward subtask decomposition-based learning and benchmarking for genetic perturbation outcome prediction and beyond, Nature Computational Science, 2024.

Contacts

bm2-lab@tongji.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
Data		Data
Output		Output
Supp_code		Supp_code
Tutorial		Tutorial
img		img
stamp		stamp
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAMP: Toward subtask decomposition-based learning and benchmarking for genetic perturbation outcome prediction and beyond

Introduction

Installation

Example data

Real demo data

Note: Users are encouraged to change the path of each data in 'Config.yaml' based on their own machines.

Core API interface for model training

Core API interface for model fine-tuning

Citation

Contacts

About

Releases

Packages

Contributors 2

Languages

License

bm2-lab/STAMP

Folders and files

Latest commit

History

Repository files navigation

STAMP: Toward subtask decomposition-based learning and benchmarking for genetic perturbation outcome prediction and beyond

Introduction

Installation

Example data

Real demo data

Note: Users are encouraged to change the path of each data in 'Config.yaml' based on their own machines.

Core API interface for model training

Core API interface for model fine-tuning

Citation

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages