GitHub

A Kernel dIfferentiability correlation-based NOn-negative Matrix factorization algorithm using Kullback-Leibler divergence loss Optimization (KINOMO)

We propose Kernel dIfferentiability correlation-based NOn-negative Matrix factorization algorithm using Kullback-Leibler divergence loss Optimization (KINOMO), a semi-supervised NMF model that is robust to noises and also uses ‘prior’ biological knowledge for better refinement.

Instructions for running KINOMO Please note that KINOMO needs to be implemented sample-wise and not on the integrated object (if at all it exists).

Accessing the KINOMO repository

The KINOMO repsository can be accessed via https://github.com/IzarLab/KINOMO.git
Download the KINOMO repository into your local machine.

Running KINOMO

Detailed step-by-step instructions/help (in R) for running KINOMO is specified in the './KINOMO_scripts/kinomo_run.R'. Please use this script.
If there is any error with respect to accessing specific sub-scripts/modules, please provide necessary path-to-file information.

Packages required dplyr Seurat ggplot2 gplots purrr cowplot stringr NMF pkgmaker cluster

Input file Input file is the gene expression data (raw counts) converted as a Seurat object.

Estimating the factorization rank

A critical parameter in KINOMO is the factorization rank r. It defines the number of metagenes used to approximate the target matrix.
Given a NMF method and the target matrix, a common way of deciding on r is to try different values, compute some quality measure of the results, and choose the best value according to this quality criteria.
Several approaches have then been proposed to choose the optimal value of r. For example, (Brunet2004) proposed to take the first value of r for which the cophenetic coefficient #starts decreasing, (Hutchins2008) suggested to choose the first value where the RSS curve presents an inflection point, and (Frigyesi2008) considered the smallest value at which #the decrease in the RSS is lower than the decrease of the RSS obtained from random data.

Performing factorization using estimated rank

Based on the consensus rank estimation, factorization is done.

Metagene identification

top50/100/200/300 meta-genes are estimated.
Meta-gene signatures are plotted
Meta-gene heatmaps are plotted

After the script successfully runs, the following files would be created in the current directory:

Rank plot
Meta-gene signature UMAPS
Meta-gene heatmaps
Associated .csv files
Associated .rds files

** Downstream analysis

Though we recommend using custom-based methods to integrate the factors and illustrating the correlation matrix, we provide a demo script that can be used for the purpose of illustration. "KINOMO/downstream_analysis/integrate.factors.R"
Co-correlation analysis can be run on the above integrated factor matrix using metrics such as "Spearman" or "Pearson".
For identifying the barcodes corresponding to each meta-program, the following script can be used. "KINOMO/downstream_analysis/sort.barcodes.R"
Normalized gene expression for the meta-genes per meta-programs can then be done using an EM-GMM based approach. "KINOMO/downstream_analysis/EM_GMM.R"

Detailed steps for downstream analysis

integrating the best factors (top 100 genes) using the W files across all samples.
run a co-correlation analysis on the above matrix using Spearman/Pearson to get the metaprograms.
then for each metaprogram, get the normalized gene expression for the metagenes.

Normalized Gene Contribution

Do a stouffer/weighted stouffer to identify metagene ranks per metaprogram
Sort the metagenes high to low based on stouffer integrated values
From the H matrices generated sample wise (using kinomo), identify the cell barcodes associated with each factor
Subset the barcodes from integrated single cell data based on factor barcodes (per metaprogram)
run a signature built using the ranked metagenes per metaprogram on the single cell data i. on the individual factor ii. on all the factors associated with a metaprogram
Run EM-GMM model on the single cell (normalized) and identify the modality of the distribution per gene
Based on the modality (could be unimodal, bimodal or multimodal), identify the peak center, which we define as "normalized gene contribution"
Generate a matrix using the normalized gene contribution for each metagene per metaprogram
Generate the checkerboard

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
KINOMO_scripts		KINOMO_scripts
NMF-base		NMF-base
downstream_analysis		downstream_analysis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detailed steps for downstream analysis

Normalized Gene Contribution

About

Releases

Packages

Languages

IzarLab/KINOMO

Folders and files

Latest commit

History

Repository files navigation

Detailed steps for downstream analysis

Normalized Gene Contribution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages