The recent advances in high-throughput molecular imaging push the spatial transcriptomics technologies to the subcellular resolution, which breaks the limitations of both single-cell RNA-seq and array-based spatial profiling. The latest released single-cell spatial transcriptomics data from NanoString CosMx and MERSCOPE platforms contains multi-channel immunohistochemistry images with rich information of cell types, functions, and morphologies of cellular compartments. In this work, we developed a novel method, Single-cell spatial elucidation through image-augmented Graph transformer (SiGra), to reveal spatial domains and enhance the substantially sparse and noisy transcriptomics data. SiGra applies hybrid graph transformers over a spatial graph that comprises high-content images and gene expressions of individual cells. SiGra outperformed state-of-the-art methods on both single-cell spatial profiles and spot-level spatial transcriptomics data from complex tissues. The inclusion of immunohistochemistry images improved the model performance by 37% (95%CI: 27% – 50%). SiGra improves the characterization of intratumor heterogeneity and intercellular communications in human lung cancer samples, meanwhile recovers the known microscopic anatomy in both human brain and mouse liver tissues. Overall, SiGra effectively integrates different spatial modality data to gain deep insights into the spatial cellular ecosystems.
SiGra is built based on pytorch Test on: Ubuntu 18.04, 2080TI GPU, Intel i9-9820, 3.30GHZ, 20 core, 64 GB, CUDA environment(cuda 11.2)
Required modules can be installed via requirements.txt under the project root
pip install -r requirements.txt
torchvision==0.11.1
matplotlib==2.1.1
torch==1.6.0
seaborn==0.10.0
tqdm==4.47.0
numpy==1.13.3
anndata==0.8.0
pandas==1.4.3
rpy2==3.5.2
scanpy==1.9.1
scipy==1.8.1
scikit_learn==1.1.1
torch_geometric==2.0.4
Download SiGra:
git clone https://github.com/QSong-github/SiGra
The dataset can be download here
The dataset can be download here
The dataset can be download here
you can download our processed dataset here
├── requirement.txt
├── dataset
│ └── DLPFC
│ └── 151507
│ ├── filtered_feature_bc_matrix.h5
│ ├── metadata.tsv
│ ├── sampledata.h5ad
│ └── spatial
│ ├── tissue_positions_list.csv
│ ├── full_image.tif
│ ├── tissue_hires_image.png
│ ├── tissue_lowres_image.png
│ └── nanostring
│ └── Lung9_Rep1_exprMat_file.csv
│ └── matched_annotation_all.csv
│ └── fov1
│ ├── CellComposite_F001.jpg
│ ├── sampledata.h5ad
│ └── fov2
│ ├── CellComposite_F002.jpg
│ ├── sampledata.h5ad
│ └── merscope
│ └── Cell_boundaries
│ └── Cut Images
│ └── sample_data
│ └── processed_data
├── checkpoint
│ └── nanostring_final
│ ├── final.pth
│ └── merscope_all
│ ├── final.pth
│ └── 10x_final
│ └── 151507
│ ├── final.pth
# go to /path/to/Sigra
# for NanoString CosMx dataset
python3 processing.py --dataset nanostring
# for Vizgen MERSCOPE dataset
python3 processing.py --dataset merscope
# for 10x Visium dataset
python3 processing.py --dataset 10x
go to /path/to/SiGra/SiGra_model
Download the datasets and checkpoints and put in folders as above.
The results will be stored in "/path/siGra/results/nanostring/"
python3 train.py --test_only 1 --save_path ../checkpoint/nanostring_final/ --pretrain final.pth --dataset nanostring
The reuslts will be stored in /path/siGra/reuslts/merscope/
python3 train.py --test_only 1 --save_path ../checkpoint/merscope_final/ --pretrain final.pth --dataset merscope --root ../dataset/mouseLiver
The results will be stored in "/path/siGra/results/10x_final/"
python3 train.py --test_only 1 --save_path ../checkpoint/10x_final/ --id 151507 --ncluster 7 --dataset 10x --root ../dataset/DLPFC
And you can use the bash scripts to test all slices:
sh test_visium.sh
The hyperparameters were manually selected in individual datasets
python3 train.py --dataset nanostring --test_only 0 --save_path ../checkpoint/nanostring_train/ --seed 1234 --epochs 900 --lr 1e-3
python3 train.py --dataset merscope --test_only 0 --save_path ../checkpoint/merscope_train/ --seed 1234 --epochs 1000 --lr 1e-3 --root ../dataset/mouseLiver
python3 train.py --dataset 10x --test_only 0 --save_path ../checkpoint/10x_train/ --seed 1234 --epochs 600 --lr 1e-3 --id 151507 --ncluster 7 --repeat 1 --root ../dataset/DLPFC
And you can use the bash scripts to train all slices:
sh train_visium.sh
Please cite our paper if you use this code in your own work:
Tang Z, Zhang T, Yang B, Su J, Song Q. SiGra: Single-cell spatial elucidation through image-augmented graph transformer. bioRxiv. 2022 Aug 19:2022-08.