Pando leverages multi-modal singel-cell measurements to infer gene regulatory networks (GRNs) using a flexible modelling framework. By modeling the relationship between TF-binding site pairs with the expression of target genes, Pando simultaneously infers gene modules and sets of regulatory regions for each transcription factor.
devtools::install_github('quadbiolab/Pando')
The fate and state of a cell are regulated through complex circuits of transcription factors (TFs) converging at regulatory elements to enable precise control of gene expression. Modern single-cell genomic approaches allow the simultaneous profiling of gene expression and chromatin accessibility in individual cells, which opens up new opportunities for the inference of gene regulatory networks (GRNs).
The unifying idea behind many modern GRN inference methods is to model the expression of each gene as a function of TF abundances. The weights or coefficients of this model can then be interpreted as a measure of the regulatory interaction between TF and target gene. Additional (epi-) genomic information (such as predicted TF binding sites) is often used to constrain or refine the model.
Pando tries to generalize this concept to make use of the multi-modal nature of modern single-cell technologies by incorporating TF binding information directly into the model. By utilizing jointly measured or integrated scRNA-seq and scATAC-seq data, Pando models the expression of genes based on the interaction of TF expression with the accessibility of their putative binding site. By offering a number of different pre-processing and modeling choices, Pando strives to be a modular and flexible framework for single-cell GRN inference.
Pando interacts directly with Seurat objects and integrates well with Seurat and Signac workflows. To use Pando, you'll need a Seurat object with two assays, one with scRNA-seq transcript counts and one with scATAC-seq peak accessibility. With this object (let's call it seurat_object
) ready, you can start off by initializing the GRN using the function initiate_grn()
:
seurat_object <- initiate_grn(seurat_object)
This will create a RegulatoryNetwork
object inside the Seurat object and select candidate regulatory regions. By default, Pando will consider all peaks as putative regulatory regions, but the set of candidate regions can be constrained by providing a GenomicRanges
object in the regions
argument. Pando ships with a set of conserved regions (phastConsElements20Mammals.UCSC.hg38
) as well as predicted regulatory elements from ENCODE (SCREEN.ccRE.UCSC.hg38
) for the human genome (hg38), which could be used here. However, one could also select candidate regions in other ways, for instance by using Cicero.
Once the RegulatoryNetwork
object is initiated with candidate regions, we can scan for TF binding motifs in these regions by using the function find_motifs()
library(BSgenome.Hsapiens.UCSC.hg38)
data(motifs)
seurat_object <- find_motifs(
seurat_object,
pfm = motifs,
genome = BSgenome.Hsapiens.UCSC.hg38
)
This uses motifmatchr to pair up TFs with their putative binding sites. Pando provides a custom motif database (motifs
) compiled from JASPAR and CIS-BP, but in principle any PFMatrixList
object can be provided here. A data frame with motif-to-TF assignments can be provided in the motif_tfs
argument.
Now everything should be ready to infer the GRN by fitting regression models for the expression of each gene. In Pando, this can be done by using the function infer_grn()
:
seurat_object <- infer_grn(
seurat_object,
peak_to_gene_method = 'Signac',
method = 'glm'
)
Here, we first select regions near genes, either by simply considering a distance upstream and/or downstream of the gene (peak_to_gene_method='Signac'
) or by also considering overlapping regulatory regions as is done by GREAT (peak_to_gene_method='GREAT'
).
You can also choose between a number of different models using the method
argument, such as GLMs ('glm'
) regularized GLMs ('glmnet'
, 'cv.glmnet'
) or Bayesian regression models ('brms'
). We are also integrated gradient boosting regression with XGBoost as it is used by GRNBoost/SCENIC as well as bagging and bayesian ridge models with scikit-learn as they are used by CellOracle.
Once the models are fit, model coefficients can be inspected with
coef(seurat_object)
Based on the model coefficients, we can construct a network between TFs and target genes. This can be further summarized to construct gene and regulatory modules with the set of target genes and regulatory regions for each TF. In Pando we do this with
seurat_object <- find_modules(seurat_object)
To access the extracted modules, you can use the function NetworkModules()
:
modules <- NetworkModules(seurat_object)
modules@meta
The meta
slot holds a dataframe with module inforamtion.
If you are curious to find out more, check out our vignettes!