Skip to content

Motif subclustering to reveal within-motif heterogeneity

Pre-release
Pre-release
Compare
Choose a tag to compare
@AvantiShri AvantiShri released this 18 Feb 20:23
· 115 commits to master since this release
09e8fec

Corresponds to PR #84

  • Performs density-adapted clustering (Leiden) + computes a tsne embedding. By default, perplexity of 30 is used for both. If TF-MoDISco is run with this version from the beginning, the subclustering will automatically be computed for each motif. Otherwise, motifs computed with a previous version can be loaded and the subclutering computed post-hoc.
  • Uses all pairwise continuous jaccard similarities for computing the similarity matrix, but is memory efficient because the number of nearest neighbors used to perform the density adaptation is far smaller (it's perplexity*3 + 1) than the total number of nodes in the graph (only the similarities for the necessary number of nearest neighbors are kept in memory)
  • Added support for saving and loading the subclustering from file

Notebook demonstrating the change on the example tal-gata notebook: https://github.com/kundajelab/tfmodisco/blob/c2c6001b8a2608ee5224ac7faeb69d5fd72f78f5/examples/simulated_TAL_GATA_deeplearning/TF_MoDISco_TAL_GATA.ipynb

Notebook demonstrating how to compute the subclustering post-hoc:
http://mitra.stanford.edu/kundaje/avanti/tfmodisco_bio_experiments/bpnet/trial1/TryBpNet_v0.5.12.0_add_in_subclustering.html
(github permalink: https://github.com/kundajelab/tfmodisco_bio_experiments/blob/54df6faa20773d91107e7b645649e2145e3fb0de/bpnet/trial1/TryBpNet_v0.5.12.0_add_in_subclustering.ipynb)