Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene set scores correlate w/ cell content #86

Open
dpcook opened this issue Aug 17, 2020 · 2 comments
Open

Gene set scores correlate w/ cell content #86

dpcook opened this issue Aug 17, 2020 · 2 comments

Comments

@dpcook
Copy link

dpcook commented Aug 17, 2020

Hi there. Thanks for the work on this package! I love the incorporation of autocorrelation.

I've been looking into the scores produced by Vision (seurat object as input) and have found that that scores correlate fairly well with total UMI counts.

vision <- Vision(seurat[[i]],
                   signatures = signatures,
                   sig_gene_threshold = 0.05,
                   projection_methods=NULL)

I noticed that my signatures were all correlating with each other, so I checked their correlation with total UMI per cell and found that many correlate well

image

And this seems to be a related to the total size of the signature:

image

I only mention this because I believe the default sig_norm_method is supposed to deal with this

@deto
Copy link
Member

deto commented Aug 17, 2020

You're correct - the sig_norm_method was designed to deal with this and so I'm a bit surprised, and would like to understand what's going on here.

I see you're using a Seurat object as input - what preprocessing steps had you run so far on that object? Are all the genes present, or has it been filtered yet?

I think this might be related to the sig_gene_threshold input that was added. If you have a chance, can you try re-running with sig_gene_threshold = .001 (the default), and see how that changes the correlation?

@dpcook
Copy link
Author

dpcook commented Aug 26, 2020

Hey David--sorry about the delay. Just had a change to revisit this. Here are some details to help explore this further

So yes, I started with a seurat object comprising a "pure" population of cancer cells with the goal of using Vision for scoring and calculating autocorrelation. Given the purity of the population, I reasoned that I would expect meaningful genes to be detected in at least 5% of cells, so increased the sig_gene_threshold thinking it may make for cleaner results. Prior to Vision, the data was processed with a straight forward pipeline (QC filtering > SCTransform > PCA > UMAP > Cluster > Subset cancer cells > re-normalize with SCTransform > PCA > UMAP). It still contains all genes.

Previous run with sig_gene_threshold=0.05 on MSigDB Hallmark gene sets:

> vision <- Vision(seurat,
+                  signatures = "~/Data/GeneLists/hallmark.genesets.v6.1.symbols.gmt",
+                  sig_gene_threshold = 0.05,
+                  projection_methods=NULL)
Importing counts from obj[["RNA"]]@counts ...
Normalizing to counts per 10,000...
Importing Meta Data from obj@meta.data ...
Importing latent space from Embeddings(obj, "pca") using first 50 components
Loading data from ~/Data/GeneLists/hallmark.genesets.v6.1.symbols.gmt ...

Using 9419/21862 genes detected in 5.00% of cells for signature analysis.
See the `sig_gene_threshold` input to change this behavior.

Adding Visualization: Seurat_pca
Adding Visualization: Seurat_umap
> vision <- analyze(vision)
Beginning Analysis

Clustering cells...completed

Projecting data into 2 dimensions...

Evaluating signature scores on cells...

  |======================================================================================| 100%, Elapsed 00:00
Evaluating signature-gene importance...

  |======================================================================================| 100%, Elapsed 00:02
Creating 5 background signature groups with the following parameters:
  sigSize sigBalance
1      20  1.0000000
2      60  1.0000000
3     116  1.0000000
4     163  1.0000000
5     192  0.6053674
  signatures per group: 3000
Computing KNN Cell Graph in the Latent Space...

Evaluating local consistency of signatures in latent space...

  |======================================================================================| 100%, Elapsed 00:00
  |======================================================================================| 100%, Elapsed 01:07
  |======================================================================================| 100%, Elapsed 01:45
  |======================================================================================| 100%, Elapsed 00:01
Clustering signatures...

fitting ...
  |=====================================================================================================| 100%
Computing differential signature tests...

  |======================================================================================| 100%, Elapsed 00:00
  |======================================================================================| 100%, Elapsed 00:03
Computing correlations between signatures and latent space components...

  |======================================================================================| 100%, Elapsed 00:01
Analysis Complete!

> scores <- getSignatureScores(vision)
> hist(cor(scores, seurat$nCount_RNA), breaks=50)

image

Now re-running with default sig_gene_threshold:

> vision <- Vision(seurat,
+                  signatures = "~/Data/GeneLists/hallmark.genesets.v6.1.symbols.gmt",
+                  projection_methods=NULL)
Importing counts from obj[["RNA"]]@counts ...
Normalizing to counts per 10,000...
Importing Meta Data from obj@meta.data ...
Importing latent space from Embeddings(obj, "pca") using first 50 components
Loading data from ~/Data/GeneLists/hallmark.genesets.v6.1.symbols.gmt ...

Using 18828/21862 genes detected in 0.10% of cells for signature analysis.
See the `sig_gene_threshold` input to change this behavior.

Adding Visualization: Seurat_pca
Adding Visualization: Seurat_umap
> vision <- analyze(vision)
Beginning Analysis

Clustering cells...completed

Projecting data into 2 dimensions...

Evaluating signature scores on cells...

  |============================================================| 100%, Elapsed 00:00
Evaluating signature-gene importance...

  |============================================================| 100%, Elapsed 00:02
Creating 5 background signature groups with the following parameters:
  sigSize sigBalance
1      33  1.0000000
2      57  1.0000000
3      97  1.0000000
4     183  1.0000000
5     301  0.5428075
  signatures per group: 3000
Computing KNN Cell Graph in the Latent Space...

Evaluating local consistency of signatures in latent space...

  |============================================================| 100%, Elapsed 00:00
  |============================================================| 100%, Elapsed 00:36
  |============================================================| 100%, Elapsed 00:37
  |============================================================| 100%, Elapsed 00:00
Clustering signatures...

fitting ...
  |===========================================================================| 100%
Computing differential signature tests...

  |============================================================| 100%, Elapsed 00:00
  |============================================================| 100%, Elapsed 00:02
Computing correlations between signatures and latent space components...

  |============================================================| 100%, Elapsed 00:01
Analysis Complete!

> scores <- getSignatureScores(vision)
> hist(cor(scores, seurat$nCount_RNA), breaks=50)

image

Doesn't seem to improve the issue.

Looked at the distribution of the scores:
image

And then the relationship between mean score and how much the signature correlated with UMI (thinking that maybe it was only when scores were low or something)
image

In case you want to look at this specific example, I've uploaded this Seurat object and the hallmark gene set to a Google Drive you can access here

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggridges_0.5.2  forcats_0.5.0   stringr_1.4.0   dplyr_1.0.2     purrr_0.3.4     readr_1.3.1     tidyr_1.1.1     tibble_3.0.3    ggplot2_3.3.2  
[10] tidyverse_1.3.0 VISION_2.1.0    Seurat_3.2.0   

loaded via a namespace (and not attached):
  [1] Rtsne_0.15            colorspace_1.4-1      deldir_0.1-28         ellipsis_0.3.1        mclust_5.4.6          fs_1.5.0             
  [7] rstudioapi_0.11       spatstat.data_1.4-3   farver_2.0.3          leiden_0.3.3          listenv_0.8.0         ggrepel_0.8.2        
 [13] fansi_0.4.1           lubridate_1.7.9       xml2_1.3.2            codetools_0.2-16      splines_4.0.2         logging_0.10-108     
 [19] knitr_1.29            polyclip_1.10-0       jsonlite_1.7.0        broom_0.7.0           ica_1.0-2             cluster_2.1.0        
 [25] dbplyr_1.4.4          png_0.1-7             uwot_0.1.8            shiny_1.5.0           wordspace_0.2-6       sctransform_0.2.1    
 [31] plumber_0.4.6         compiler_4.0.2        httr_1.4.2            backports_1.1.8       assertthat_0.2.1      Matrix_1.2-18        
 [37] fastmap_1.0.1         lazyeval_0.2.2        cli_2.0.2             later_1.1.0.1         htmltools_0.5.0       tools_4.0.2          
 [43] rsvd_1.0.3            igraph_1.2.5          gtable_0.3.0          glue_1.4.1            RANN_2.6.1            reshape2_1.4.4       
 [49] Rcpp_1.0.5            spatstat_1.64-1       cellranger_1.1.0      vctrs_0.3.2           ape_5.4-1             nlme_3.1-148         
 [55] lmtest_0.9-37         xfun_0.16             globals_0.12.5        rvest_0.3.6           mime_0.9              miniUI_0.1.1.1       
 [61] lifecycle_0.2.0       irlba_2.3.3           goftest_1.2-2         future_1.18.0         MASS_7.3-52           zoo_1.8-8            
 [67] scales_1.1.1          loe_1.1               hms_0.5.3             promises_1.1.1        spatstat.utils_1.17-0 parallel_4.0.2       
 [73] RColorBrewer_1.1-2    reticulate_1.16       pbapply_1.4-3         gridExtra_2.3         rpart_4.1-15          fastICA_1.2-2        
 [79] stringi_1.4.6         permute_0.9-5         rlang_0.4.7           pkgconfig_2.0.3       matrixStats_0.56.0    lattice_0.20-41      
 [85] ROCR_1.0-11           tensor_1.5            labeling_0.3          patchwork_1.0.1       htmlwidgets_1.5.1     cowplot_1.0.0        
 [91] tidyselect_1.1.0      RcppAnnoy_0.0.16      plyr_1.8.6            magrittr_1.5          R6_2.4.1              generics_0.0.2       
 [97] DBI_1.1.0             withr_2.2.0           pillar_1.4.6          haven_2.3.1           mgcv_1.8-31           fitdistrplus_1.1-1   
[103] survival_3.2-3        abind_1.4-5           future.apply_1.6.0    modelr_0.1.8          crayon_1.3.4          utf8_1.1.4           
[109] KernSmooth_2.23-17    plotly_4.9.2.1        readxl_1.3.1          grid_4.0.2            data.table_1.13.0     blob_1.2.1           
[115] vegan_2.5-6           reprex_0.3.0          sparsesvd_0.2         digest_0.6.25         pbmcapply_1.5.0       xtable_1.8-4         
[121] httpuv_1.5.4          munsell_0.5.0         viridisLite_0.3.0     iotools_0.3-1        
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggridges_0.5.2  forcats_0.5.0   stringr_1.4.0   dplyr_1.0.2     purrr_0.3.4     readr_1.3.1     tidyr_1.1.1    
 [8] tibble_3.0.3    ggplot2_3.3.2   tidyverse_1.3.0 VISION_2.1.0    Seurat_3.2.0   

loaded via a namespace (and not attached):
  [1] Rtsne_0.15            colorspace_1.4-1      deldir_0.1-28         ellipsis_0.3.1        mclust_5.4.6         
  [6] fs_1.5.0              rstudioapi_0.11       spatstat.data_1.4-3   farver_2.0.3          leiden_0.3.3         
 [11] listenv_0.8.0         ggrepel_0.8.2         fansi_0.4.1           lubridate_1.7.9       xml2_1.3.2           
 [16] codetools_0.2-16      splines_4.0.2         logging_0.10-108      knitr_1.29            polyclip_1.10-0      
 [21] jsonlite_1.7.0        broom_0.7.0           ica_1.0-2             cluster_2.1.0         dbplyr_1.4.4         
 [26] png_0.1-7             uwot_0.1.8            shiny_1.5.0           wordspace_0.2-6       sctransform_0.2.1    
 [31] plumber_0.4.6         compiler_4.0.2        httr_1.4.2            backports_1.1.8       assertthat_0.2.1     
 [36] Matrix_1.2-18         fastmap_1.0.1         lazyeval_0.2.2        cli_2.0.2             later_1.1.0.1        
 [41] htmltools_0.5.0       tools_4.0.2           rsvd_1.0.3            igraph_1.2.5          gtable_0.3.0         
 [46] glue_1.4.1            RANN_2.6.1            reshape2_1.4.4        Rcpp_1.0.5            spatstat_1.64-1      
 [51] cellranger_1.1.0      vctrs_0.3.2           ape_5.4-1             nlme_3.1-148          lmtest_0.9-37        
 [56] xfun_0.16             globals_0.12.5        rvest_0.3.6           mime_0.9              miniUI_0.1.1.1       
 [61] lifecycle_0.2.0       irlba_2.3.3           goftest_1.2-2         future_1.18.0         MASS_7.3-52          
 [66] zoo_1.8-8             scales_1.1.1          loe_1.1               hms_0.5.3             promises_1.1.1       
 [71] spatstat.utils_1.17-0 parallel_4.0.2        RColorBrewer_1.1-2    reticulate_1.16       pbapply_1.4-3        
 [76] gridExtra_2.3         rpart_4.1-15          fastICA_1.2-2         stringi_1.4.6         permute_0.9-5        
 [81] rlang_0.4.7           pkgconfig_2.0.3       matrixStats_0.56.0    lattice_0.20-41       ROCR_1.0-11          
 [86] tensor_1.5            labeling_0.3          patchwork_1.0.1       htmlwidgets_1.5.1     cowplot_1.0.0        
 [91] tidyselect_1.1.0      RcppAnnoy_0.0.16      plyr_1.8.6            magrittr_1.5          R6_2.4.1             
 [96] generics_0.0.2        DBI_1.1.0             withr_2.2.0           pillar_1.4.6          haven_2.3.1          
[101] mgcv_1.8-31           fitdistrplus_1.1-1    survival_3.2-3        abind_1.4-5           future.apply_1.6.0   
[106] modelr_0.1.8          crayon_1.3.4          utf8_1.1.4            KernSmooth_2.23-17    plotly_4.9.2.1       
[111] readxl_1.3.1          grid_4.0.2            data.table_1.13.0     blob_1.2.1            vegan_2.5-6          
[116] reprex_0.3.0          sparsesvd_0.2         digest_0.6.25         pbmcapply_1.5.0       xtable_1.8-4         
[121] httpuv_1.5.4          munsell_0.5.0         viridisLite_0.3.0     iotools_0.3-1        

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants