Skip to content
/ Splotch Public

Splotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics (ST) data

License

Notifications You must be signed in to change notification settings

tare/Splotch

Repository files navigation

Splotch

Splotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics (ST) [1] data.

Features

  • Supports complex hierarchical experimental designs and model-based analysis of replicates
  • Full Bayesian inference with Hamiltonian Monte Carlo (HMC) using the adaptive HMC sampler as implemented in NumPyro [2]
    • CPU, GPU, and TPU support
  • Analysis of expression differences between anatomical regions and conditions using posterior samples
  • Different anatomical annotated regions (AARs) are modeled using a linear model
  • Zero-inflated Poisson or Poisson likelihood
  • Gaussian Process prior for spatial random effect

The Splotch code in this repository supports single-, two-, and three-level experimental designs.

Installation

PyPI

$ pip install splotch-st

GitHub

$ pip install git+https://git@github.com/tare/Splotch.git

CUDA

To install JAX with NVIDIA support, please see this page for instructions.

Usage

The main steps of Splotch analysis are the following:

  1. Preparation of count files
  2. Annotation of ST spots
  3. Preparation of metadata table
  4. Splotch analysis

Preparation of count files

The count files have the following tab-separated values (TSV) file format

32.06_2.04 31.16_2.04 14.07_2.1 28.16_33.01
A130010J15Rik 0 0 0 0
A230046K03Rik 0 0 0 0
A230050P20Rik 0 0 0 0
A2m 0 1 0 0
Zzz3 0 1 0 0

The rows and columns have gene identifiers and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively.

Annotation of ST spots

To get the most out of the statistical model of Splotch one has to annotate the ST spots based on their tissue context. These annotations will allow the model to share information across tissue sections, resulting in more robust conclusions.

To make the annotation step slightly less tedious, we have implemented a light-weight javascript tool called Span.

The annotation files have the following TSV file format

32.06_2.04 31.16_2.04 14.07_2.1 28.16_33.01
Vent_Med_White 0 0 0 0
Vent_Horn 1 1 0 0
Vent_Lat_White 0 0 0 0
Med_Grey 0 0 0 0
Dors_Horn 0 0 0 0
Dors_Edge 0 0 0 1
Med_Lat_White 0 0 0 0
Vent_Edge 0 0 1 0
Dors_Med_White 0 0 0 0
Cent_Can 0 0 0 0
Lat_Edge 0 0 0 0

The rows and columns correspond to the user-define anatomical annotation regions (AAR) and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively. For instance, the spot 32.06_2.04 has the Vent_Horn annotation (i.e. located in ventral horn). The annotation category of each ST spot is one-hot encoded and we do not currently support more than one annotation category per ST spot.

ST spots without annotation categories are discarded in the analysis. This behaviour can be useful when you want to discard some ST spots from the analysis based on the tissue histology.

Preparation of metadata table

The metadata table contains information about the samples (i.e. count files). Additionally, the metadata table is used for matching count and annotation files.

The metadata table has the following TSV file format

name level_1 level_2 level_3 count_file annotation_file image_file
L7CN36_C1 G93A p120 F 1394 count_tables/L7CN36_C1_stdata_aligned_counts_IDs.txt.unified.tsv annotations/L7CN36_C1.tsv images/L7CN36_C1_HE.jpg
L7CN36_C2 G93A p120 F 1394 count_tables/L7CN36_C2_stdata_aligned_counts_IDs.txt.unified.tsv annotations/L7CN36_C2.tsv images/L7CN36_C2_HE.jpg
L7CN30_C1 WT p120 M 2967 count_tables/L7CN30_C1_stdata_aligned_counts_IDs.txt.unified.tsv annotations/L7CN30_C1.tsv images/L7CN30_C1_HE.jpg
L7CN30_C2 WT p120 M 2967 count_tables/L7CN30_C2_stdata_aligned_counts_IDs.txt.unified.tsv annotations/L7CN30_C2.tsv images/L7CN30_C2_HE.jpg
L7CN69_D1 WT p120 M 1310 count_tables/L7CN69_D1_stdata_aligned_counts_IDs.txt.unified.tsv annotations/L7CN69_D1.tsv images/L7CN69_D1_HE.jpg
L7CN69_D2 WT p120 M 1310 count_tables/L7CN69_D2_stdata_aligned_counts_IDs.txt.unified.tsv annotations/L7CN69_D2.tsv images/L7CN69_D2_HE.jpg
CN96_E1 WT p120 F 1040 count_tables/CN96_E1_stdata_aligned_counts_IDs.txt.unified.tsv annotations/CN96_E1.tsv images/CN96_E1_HE.jpg
CN96_E2 WT p120 F 1040 count_tables/CN96_E2_stdata_aligned_counts_IDs.txt.unified.tsv annotations/CN96_E2.tsv images/CN96_E2_HE.jpg
CN93_E1 G93A p120 M 975 count_tables/CN93_E1_stdata_aligned_counts_IDs.txt.unified.tsv annotations/CN93_E1.tsv images/CN93_E1_HE.jpg
CN93_E2 G93A p120 M 975 count_tables/CN93_E2_stdata_aligned_counts_IDs.txt.unified.tsv annotations/CN93_E2.tsv images/CN93_E2_HE.jpg

Each sample (i.e. slide) has its own row in the metadata table. The columns level_1, level_2, and level_3 define how the samples are analyzed using the linear hierarchical AAR model. The columns level_1, count_file, and annotation_file are mandatory. The column level_2 is mandatory when using the two-level model. Similarly, the columns level_2 and level_3 are mandatory when using the three-level model. At the moment we only support categorical variables.

If a given slide contains tissue sections from multiple biological conditions in terms of the explanatory variables, then it is recommended to split the tissue sections into multiple count files so that the design matrix can be defined accordingly.

The user can include additional columns at their own discretion. For instance, we will use the column image_file in the tutorials.

Example data

In the tutorials directory, we have two example ST data sets

  1. ALS [3]
  2. Olfactory Bulb [1]

Splotch analysis

Please see the ALS and Olfactory Bulb tutorials.

In the simplest setting, the following lines would be enough to run Splotch on a single gene

# read input data
splotch_input_data = get_input_data("metadata.tsv")

# run Splotch on the Gfap gene
key = random.PRNGKey(0)
key, key_ = random.split(key)
splotch_result_nuts = run_nuts(key_, ["Gfap"], splotch_input_data)

References

[1] Ståhl, Patrik L., et al. "Visualization and analysis of gene expression in tissue sections by spatial transcriptomics." Science 353.6294 (2016): 78-82.

[2] Phan, Du, et al. "Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro." arXiv preprint 1912.11554 (2019).

[3] Maniatis, Silas, et al. "Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis." Science 364.6435 (2019): 89-93.