Code for the scooby manuscript. Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings. Specifically, the decoder is conditioned on the cell position in a precomputed single-cell embedding.
This repository contains model and data loading code and a train script. The reproducibility repository contains notebooks to reproduce the results of the manuscript.
- NVIDIA GPU (tested on A40), Linux, Python (tested with v3.9)
scooby uses a a custom version of SnapATAC2, which can be installed with pip
. This is best installed in a separate environment due to numpy version conflicts with scooby.
pip install snapatac2-scooby
pip install git+https://github.com/gagneurlab/scooby.git
- Download file contents from the Zenodo repo
- Use examples from the scooby reproducibility repository
We offer a train script for modeling scRNA-seq only and a script for multiome modeling. Both require SNAPATAC2-preprocessed anndatas and embeddings. Training scooby takes 1-2 days on 8 NVIDIA A40 GPUs with 128GB RAM and 32 cores.
Currently, the model is only tested with a batch size of 1.