diff --git a/.Rproj.user/F5A33326/rmd-outputs b/.Rproj.user/F5A33326/rmd-outputs index 1bb70fc..346cd43 100644 --- a/.Rproj.user/F5A33326/rmd-outputs +++ b/.Rproj.user/F5A33326/rmd-outputs @@ -1,4 +1,5 @@ D:/GitHub/HemaScope_Tutorial/_book/index.html +D:/GitHub/HemaScope_Tutorial/_book/index.html D:/GitHub/HemaScope_Tutorial/_book/index.html diff --git a/_book/installation.html b/_book/installation.html index 28eb725..0764ced 100644 --- a/_book/installation.html +++ b/_book/installation.html @@ -335,7 +335,9 @@

2.4 Install required R-packages +BiocManager::install("clusterProfiler") +install.packages("doMC") +install.packages("doRNG") @@ -359,7 +361,8 @@

2.4 Install required R-packagesUsage limitations: Sometimes an API rate limit error occurs, and a GitHub token is needed to provide the GitHub API rate limit. The steps to resolve this are as follows: Register for an account or log in to an existing account on the GitHub website. Then click on your profile picture in the top right corner, go to the dropdown menu and select “Settings.” Next, find “Developer settings” and click on it, then find “Personal access tokens (classic).” Click on it, then click “Create new token (classic).” Create a new token by first naming it anything you like. Then choose the expiration time for the token. Finally, check the “repo” box; the token will be used to download code repositories from GitHub. Click “Generate token.” Copy the generated token password.

After that, set the token in the environment variable in R. Since we are using conda, enter R by typing R in the terminal. Then, enter the command: usethis::edit_r_environ(). This will open a file. Press the i key to edit. Paste the token you copied into the code area as follows: GITHUB_TOKEN=“your_token”.

Then press Esc, type :wq! (force save). After that, you need to exit Linux and re-enter R. Close and reopen the terminal to apply the environment variable. Reopen Linux, activate the conda environment, and enter R again.

-
devtools::install_github("sqjin/CellChat@9e1e605")
+
devtools::install_github("sqjin/CellChat")
+devtools::install_github("immunogenomics/presto")
 devtools::install_github("aertslab/SCENIC@fde9774")
 devtools::install_github("pzhulab/abcCellmap@f44c14b")
 devtools::install_github("navinlabcode/copykat@d7d6569")
@@ -385,7 +388,7 @@ 

2.5 Install required Python-packa
  • Install required packages
-
pip install stereopy==1.3.1 anndata==0.9.2 arboreto==0.1.6 cell2location==0.1.3 commot==0.0.3 karateclub==1.2.2 matplotlib==3.7.1 networkx==3.1 numpy==1.23.5 pandas==1.5.3 phate==1.0.11 pot==0.9.1 scanpy==1.9.6 scipy==1.10.1 scvelo==0.3.2 scvi-tools==0.20.3 seaborn==0.12.2
+
pip install stereopy==1.3.1 anndata==0.9.2 arboreto==0.1.6 cell2location==0.1.3 commot==0.0.3 karateclub==1.2.2 matplotlib==3.7.1 networkx==3.1 numpy==1.23.5 pandas==1.5.3 phate==1.0.11 pot==0.9.1 scanpy==1.9.6 scipy==1.10.1 scvelo==0.3.2 scvi-tools==0.20.3 seaborn==0.12.2 distributed==2024.2.1 dask-expr==0.5.3

2.6 The installed packages with versions

@@ -436,7 +439,7 @@

2.6 The installed packages with v carData 3.0-5 caret 6.0-94 caTools 1.18.2 -CellChat 1.5.0 +CellChat 2.0.1 cellranger 1.1.0 circlize 0.4.16 class 7.3-22 @@ -477,6 +480,8 @@

2.6 The installed packages with v diffobj 0.3.5 digest 0.6.36 dlm 1.1-6 +doMC 1.3.8 +doRNG 1.8.6 doBy 4.6.22 docopt 0.7.1 doParallel 1.0.17 @@ -664,6 +669,7 @@

2.6 The installed packages with v polyclip 1.10-6 polynom 1.4-1 praise 1.0.0 +presto 1.0.0 prettyunits 1.2.0 princurve 2.1.6 pROC 1.18.5 @@ -875,13 +881,13 @@

2.6 The installed packages with v croniter 1.4.1 cycler 0.12.1 dask 2024.7.0 -dask-expr 1.1.8 +dask-expr 0.5.3 dateutils 0.6.12 decorator 4.4.2 deepdiff 7.0.1 Deprecated 1.2.14 deprecation 2.1.0 -distributed 2024.7.0 +distributed 2024.2.1 dm-tree 0.1.8 dnspython 2.6.1 docrep 0.3.2 diff --git a/_book/search_index.json b/_book/search_index.json index 2cd688c..d020b54 100644 --- a/_book/search_index.json +++ b/_book/search_index.json @@ -1 +1 @@ -[["index.html", "HemaScope Tutorial 1 Introduction", " HemaScope Tutorial HemaScope team 2024-09-27 1 Introduction HemaScope is a specialized bioinformatics toolkit designed for analyzing both single-cell and spatial transcriptome sequencing data from hematopoietic cells, including myeloid and lymphoid lineages. We have developed an R package named HemaScopeR, a Shiny interface named HemaScopeShiny, and a cloud platform named HemaScopeCloud. This tutorial introduces how to install and use the R package and Shiny interface, as well as how to access and operate the cloud platform. "],["installation.html", "2 Installation 2.1 Create a new conda environment and activate it 2.2 Set the channels in conda 2.3 Install R and python 2.4 Install required R-packages 2.5 Install required Python-packages 2.6 The installed packages with versions", " 2 Installation 2.1 Create a new conda environment and activate it conda create --name HemaScope_env conda activate HemaScope_env 2.2 Set the channels in conda # Add the default channel conda config --add channels defaults # Add default channel URLs conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 # Add custom channels conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch-lts conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/simpleitk conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/deepmodeling # Set to show channel URLs conda config --set show_channel_urls true 2.3 Install R and python R 4.3.3 and python 3.8.19 conda install R-base=4.3.3 conda install python=3.8.19 2.4 Install required R-packages From conda conda install -c conda-forge r-devtools=2.4.5 conda install -c conda-forge r-Seurat=4.3.0.1 conda install -c conda-forge r-Rfast2=0.1.5.1 conda install -c conda-forge r-hdf5r=1.3.10 conda install -c conda-forge r-ggpubr=0.6.0 conda install pwwang::r-seuratwrappers conda install -c bioconda bioconductor-monocle=2.28.0 conda install -c bioconda bioconductor-slingshot=2.8.0 conda install -c bioconda bioconductor-GSVA=1.48.2 conda install -c bioconda bioconductor-org.Mm.eg.db=3.17.0 conda install -c bioconda bioconductor-org.Hs.eg.db=3.17.0 conda install -c bioconda bioconductor-scran=1.28.1 conda install -c bioconda bioconductor-AUCell=1.22.0 conda install -c bioconda bioconductor-RcisTarget=1.20.0 conda install -c bioconda bioconductor-GENIE3=1.24.0 conda install -c bioconda bioconductor-biomaRt=2.56.1 conda install -c bioconda r-velocyto.r=0.6 #conda install -c bioconda bioconductor-limma=3.56.2 Enter the R language environment We suggest users do not manually update any already installed R packages during the installation of the following R packages. R From BiocManager # BiocManager(version = "1.30.23") should already be installed as a dependency of r-seuratwrappers. # If it is not installed, please run the following code to install it. # install.packages("BiocManager",version="1.30.23") BiocManager::install("ComplexHeatmap") BiocManager::install("scmap") BiocManager::install("clusterProfiler") From CRAN remotes::install_version("shinyjs", version = "2.1.0") remotes::install_version("shiny", version = "1.8.0") remotes::install_version("shinyWidgets", version = "0.8.6") remotes::install_version("shinydashboard", version = "0.7.2") remotes::install_version("slickR", version = "0.6.0") remotes::install_version("phateR", version = "1.0.7") remotes::install_version("gelnet", version = "1.2.1") remotes::install_version("parallelDist", version = "0.2.6") remotes::install_version("kableExtra", version = "1.3.4") remotes::install_version("transport", version = "0.14-6") remotes::install_version("feather", version = "0.3.5") remotes::install_version("markdown", version = "1.13") From GitHub tips: Sometimes network connection issues may occur, resulting in an error message indicating that GitHub cannot be connected. Please try installing again when the network conditions improve. Usage limitations: Sometimes an API rate limit error occurs, and a GitHub token is needed to provide the GitHub API rate limit. The steps to resolve this are as follows: Register for an account or log in to an existing account on the GitHub website. Then click on your profile picture in the top right corner, go to the dropdown menu and select “Settings.” Next, find “Developer settings” and click on it, then find “Personal access tokens (classic).” Click on it, then click “Create new token (classic).” Create a new token by first naming it anything you like. Then choose the expiration time for the token. Finally, check the “repo” box; the token will be used to download code repositories from GitHub. Click “Generate token.” Copy the generated token password. After that, set the token in the environment variable in R. Since we are using conda, enter R by typing R in the terminal. Then, enter the command: usethis::edit_r_environ(). This will open a file. Press the i key to edit. Paste the token you copied into the code area as follows: GITHUB_TOKEN=“your_token”. Then press Esc, type :wq! (force save). After that, you need to exit Linux and re-enter R. Close and reopen the terminal to apply the environment variable. Reopen Linux, activate the conda environment, and enter R again. devtools::install_github("sqjin/CellChat@9e1e605") devtools::install_github("aertslab/SCENIC@fde9774") devtools::install_github("pzhulab/abcCellmap@f44c14b") devtools::install_github("navinlabcode/copykat@d7d6569") devtools::install_github('chris-mcginnis-ucsf/DoubletFinder@8c7f76e') devtools::install_github("mojaveazure/seurat-disk@877d4e1") Install HemaScopeR from github devtools::install_github(repo="ZhenyiWangTHU/HemaScopeR", dep = FALSE) Exist the R language environment quit() 2.5 Install required Python-packages Upgrade pip and set mirrors python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade pip pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple pip config set global.extra-index-url http://mirrors.aliyun.com/pypi/simple/ Install required packages pip install stereopy==1.3.1 anndata==0.9.2 arboreto==0.1.6 cell2location==0.1.3 commot==0.0.3 karateclub==1.2.2 matplotlib==3.7.1 networkx==3.1 numpy==1.23.5 pandas==1.5.3 phate==1.0.11 pot==0.9.1 scanpy==1.9.6 scipy==1.10.1 scvelo==0.3.2 scvi-tools==0.20.3 seaborn==0.12.2 2.6 The installed packages with versions R packages with versions Package Version ------- ------- abcCellmap 0.1.0 abind 1.4-5 annotate 1.78.0 AnnotationDbi 1.64.1 ape 5.8 aplot 0.2.3 arrow 17.0.0 askpass 1.2.0 assertthat 0.2.1 AUCell 1.22.0 backports 1.5.0 base 4.3.3 base64enc 0.1-3 beachmat 2.16.0 BH 1.84.0-0 Biobase 2.60.0 BiocFileCache 2.8.0 BiocGenerics 0.46.0 BiocManager 1.30.23 BiocNeighbors 1.18.0 BiocParallel 1.34.2 BiocSingular 1.16.0 BiocVersion 3.18.1 biocViews 1.68.1 biomaRt 2.56.1 Biostrings 2.68.1 bit 4.0.5 bit64 4.0.5 bitops 1.0-7 blob 1.2.4 bluster 1.10.0 boot 1.3-30 brew 1.0-10 brio 1.1.5 broom 1.0.6 bslib 0.7.0 cachem 1.1.0 callr 3.7.6 car 3.1-2 carData 3.0-5 caret 6.0-94 caTools 1.18.2 CellChat 1.5.0 cellranger 1.1.0 circlize 0.4.16 class 7.3-22 cli 3.6.3 clipr 0.8.0 clock 0.7.0 clue 0.3-65 cluster 2.1.6 clusterProfiler 4.10.1 coda 0.19-4.1 codetools 0.2-20 colorspace 2.1-0 combinat 0.0-8 commonmark 1.9.1 compiler 4.3.3 ComplexHeatmap 2.18.0 conquer 1.3.3 copykat 1.1.0 corrplot 0.92 cowplot 1.1.3 cpp11 0.4.7 crayon 1.5.3 credentials 2.0.1 crosstalk 1.2.1 curl 5.2.1 data.table 1.15.4 datasets 4.3.3 DBI 1.2.3 dbplyr 2.5.0 DDRTree 0.1.5 DelayedArray 0.26.6 DelayedMatrixStats 1.22.1 deldir 2.0-4 Deriv 4.1.3 desc 1.4.3 devtools 2.4.5 diagram 1.6.5 diffobj 0.3.5 digest 0.6.36 dlm 1.1-6 doBy 4.6.22 docopt 0.7.1 doParallel 1.0.17 DOSE 3.28.2 dotCall64 1.1-1 DoubletFinder 2.0.3 downlit 0.4.4 downloader 0.4 dplyr 1.1.4 dqrng 0.3.2 dynamicTreeCut 1.63-1 e1071 1.7-14 edgeR 3.42.4 ellipsis 0.3.2 enrichplot 1.22.0 evaluate 0.24.0 expm 0.999-9 fansi 1.0.6 farver 2.1.2 fastDummies 1.7.3 fastICA 1.2-4 fastmap 1.2.0 fastmatch 1.1-4 feather 0.3.5 fgsea 1.28.0 fields 16.2 filelock 1.0.3 fitdistrplus 1.1-11 FNN 1.1.4 fontawesome 0.5.2 forcats 1.0.0 foreach 1.5.2 foreign 0.8-87 formatR 1.14 fs 1.6.4 futile.logger 1.4.3 futile.options 1.0.1 future 1.33.2 future.apply 1.11.2 gelnet 1.2.1 generics 0.1.3 GENIE3 1.24.0 GenomeInfoDb 1.36.1 GenomeInfoDbData 1.2.11 GenomicRanges 1.52.0 gert 2.0.1 GetoptLong 1.0.5 ggalluvial 0.12.5 ggforce 0.4.2 ggfun 0.1.5 ggnetwork 0.5.13 ggnewscale 0.4.10 ggplot2 3.5.1 ggplotify 0.1.2 ggpubr 0.6.0 ggraph 2.2.1 ggrepel 0.9.5 ggridges 0.5.6 ggsci 3.2.0 ggsignif 0.6.4 ggtree 3.10.1 gh 1.4.1 gitcreds 0.1.2 GlobalOptions 0.1.2 globals 0.16.3 glue 1.7.0 GO.db 3.18.0 goftest 1.2-3 googleVis 0.7.3 GOSemSim 2.28.1 gower 1.0.1 gplots 3.1.3.1 graph 1.78.0 graphics 4.3.3 graphlayouts 1.1.1 grDevices 4.3.3 grid 4.3.3 gridBase 0.4-7 gridExtra 2.3 gridGraphics 0.5-1 GSEABase 1.62.0 gson 0.1.0 GSVA 1.48.2 gtable 0.3.5 gtools 3.9.5 hardhat 1.4.0 haven 2.5.4 HDF5Array 1.28.1 hdf5r 1.3.10 HDO.db 0.99.1 HemaScopeR 1.0.0 here 1.0.1 hexbin 1.28.3 highr 0.11 hms 1.1.3 HSMMSingleCell 1.20.0 htmltools 0.5.8.1 htmlwidgets 1.6.4 httpuv 1.6.15 httr 1.4.7 httr2 1.0.2 ica 1.0-3 igraph 2.0.3 ini 0.3.1 ipred 0.9-14 IRanges 2.34.1 irlba 2.3.5.1 isoband 0.2.7 iterators 1.0.14 jquerylib 0.1.4 jsonlite 1.8.8 kableExtra 1.3.4 KEGGREST 1.40.0 kernlab 0.9-32 KernSmooth 2.23-24 knitr 1.48 labeling 0.4.3 lambda.r 1.2.4 later 1.3.2 lattice 0.22-6 lava 1.7.3 lazyeval 0.2.2 leiden 0.4.3.1 leidenbase 0.1.27 lifecycle 1.0.4 limma 3.56.2 listenv 0.9.1 lme4 1.1-35.5 lmtest 0.9-40 locfit 1.5-9.9 lsei 1.3-0 lubridate 1.9.3 magrittr 2.0.3 maps 3.4.2 maptools 1.1-8 markdown 1.13 MASS 7.3-60.0.1 Matrix 1.6-5 MatrixGenerics 1.12.2 MatrixModels 0.5-3 matrixStats 1.3.0 mcmc 0.9-8 MCMCpack 1.7-0 memoise 2.0.1 metapod 1.8.0 methods 4.3.3 mgcv 1.9-1 microbenchmark 1.4.10 mime 0.12 miniUI 0.1.1.1 minqa 1.2.7 mixtools 2.0.0 ModelMetrics 1.2.2.2 modelr 0.1.11 monocle 2.28.0 munsell 0.5.1 network 1.18.2 nlme 3.1-165 nloptr 2.0.3 NMF 0.27 nnet 7.3-19 npsurv 0.5-0 numDeriv 2016.8-1.1 openssl 2.2.0 org.Hs.eg.db 3.17.0 org.Mm.eg.db 3.17.0 parallel 4.3.3 parallelDist 0.2.6 parallelly 1.37.1 patchwork 1.2.0 pbapply 1.7-2 pbkrtest 0.5.2 pcaMethods 1.92.0 phateR 1.0.7 pheatmap 1.0.12 pillar 1.9.0 pkgbuild 1.4.4 pkgconfig 2.0.3 pkgdown 2.1.0 pkgload 1.3.4 plogr 0.2.0 plotly 4.10.4 plyr 1.8.9 png 0.1-8 polyclip 1.10-6 polynom 1.4-1 praise 1.0.0 prettyunits 1.2.0 princurve 2.1.6 pROC 1.18.5 processx 3.8.4 prodlim 2024.06.25 profvis 0.3.8 progress 1.2.3 progressr 0.14.0 promises 1.3.0 proxy 0.4-27 ps 1.7.7 purrr 1.0.2 qlcMatrix 0.9.8 quantreg 5.98 qvalue 2.34.0 R.methodsS3 1.8.2 R.oo 1.26.0 R.utils 2.12.3 R6 2.5.1 ragg 1.3.2 randomForest 4.7-1.1 RANN 2.6.1 rappdirs 0.3.3 RBGL 1.76.0 RcisTarget 1.20.0 rcmdcheck 1.4.0 RColorBrewer 1.1-3 Rcpp 1.0.13 RcppAnnoy 0.0.22 RcppArmadillo 14.0.0-1 RcppEigen 0.3.4.0.0 RcppGSL 0.3.13 RcppHNSW 0.6.0 RcppParallel 5.1.6 RcppProgress 0.4.2 RcppTOML 0.2.2 RcppZiggurat 0.1.6 RCurl 1.98-1.16 readr 2.1.5 readxl 1.4.3 recipes 1.1.0 registry 0.5-1 rematch 2.0.0 rematch2 2.1.2 remotes 2.5.0 reshape2 1.4.4 reticulate 1.38.0 Rfast 2.1.0 Rfast2 0.1.5.1 rhdf5 2.44.0 rhdf5filters 1.12.1 Rhdf5lib 1.22.0 rio 1.1.1 rjson 0.2.21 rlang 1.1.4 rmarkdown 2.27 rngtools 1.5.2 ROCR 1.0-11 roxygen2 7.3.2 rpart 4.1.23 rprojroot 2.0.4 RSpectra 0.16-2 RSQLite 2.3.7 rstatix 0.7.2 rstudioapi 0.16.0 rsvd 1.0.5 Rtsne 0.17 RUnit 0.4.33 rversions 2.1.2 rvest 1.0.4 S4Arrays 1.0.4 S4Vectors 0.38.1 sass 0.4.9 ScaledMatrix 1.8.1 scales 1.3.0 scattermore 1.2 scatterpie 0.2.3 SCENIC 1.3.0 scmap 1.24.0 scran 1.28.1 sctransform 0.4.1 scuttle 1.10.1 segmented 2.1-0 selectr 0.4-2 sessioninfo 1.2.2 Seurat 4.3.0.1 SeuratDisk 0.0.0.9021 SeuratObject 5.0.2 SeuratWrappers 0.3.1 shadowtext 0.1.4 shape 1.4.6.1 shinyjs 2.1.0 shiny 1.8.0 shinyWidgets 0.8.6 shinydashboard 0.7.2 slickR 0.6.0 SingleCellExperiment 1.22.0 sitmo 2.0.2 slam 0.1-51 slingshot 2.8.0 sna 2.7-2 snow 0.4-4 sourcetools 0.1.7-1 sp 2.1-4 spam 2.10-0 SparseM 1.84 sparseMatrixStats 1.12.2 sparsesvd 0.2-2 spatstat.data 3.1-2 spatstat.explore 3.2-6 spatstat.geom 3.2-9 spatstat.random 3.2-3 spatstat.sparse 3.1-0 spatstat.univar 3.0-0 spatstat.utils 3.0-5 splines 4.3.3 SQUAREM 2021.1 statmod 1.5.0 statnet.common 4.9.0 stats 4.3.3 stats4 4.3.3 stringi 1.8.4 stringr 1.5.1 SummarizedExperiment 1.30.2 survival 3.7-0 svglite 2.1.3 sys 3.4.2 systemfonts 1.1.0 tcltk 4.3.3 tensor 1.5 testthat 3.2.1.1 textshaping 0.3.7 tibble 3.2.1 tidygraph 1.3.1 tidyr 1.3.1 tidyselect 1.2.1 tidytree 0.4.6 timechange 0.3.0 timeDate 4032.109 tinytex 0.51 tools 4.3.3 TrajectoryUtils 1.8.0 transport 0.14-6 treeio 1.26.0 tweenr 2.0.3 tzdb 0.4.0 urlchecker 1.0.1 usethis 2.2.3 utf8 1.2.4 utils 4.3.3 uwot 0.1.16 vctrs 0.6.5 velocyto.R 0.6 VGAM 1.1-11 viridis 0.6.5 viridisLite 0.4.2 vroom 1.6.5 waldo 0.5.2 webshot 0.5.5 whisker 0.4.1 withr 3.0.0 writexl 1.5.0 xfun 0.46 XML 3.99-0.17 xml2 1.3.6 xopen 1.0.1 xtable 1.8-4 XVector 0.40.0 yaml 2.3.9 yulab.utils 0.1.4 zip 2.3.1 zlibbioc 1.46.0 zoo 1.8-12 Python packages with versions Package Version ------------------------ -------------- absl-py 2.1.0 access 1.1.9 affine 2.4.0 aiohttp 3.9.5 aiosignal 1.3.1 anndata 0.10.8 annotated-types 0.7.0 anyio 4.4.0 arboreto 0.1.6 argcomplete 3.4.0 array_api_compat 1.7.1 arrow 1.3.0 attrs 23.2.0 backoff 2.2.1 beautifulsoup4 4.12.3 blessed 1.20.0 bokeh 3.5.0 boto3 1.34.145 botocore 1.34.145 cell2location 0.1.3 certifi 2024.7.4 charset-normalizer 3.3.2 chex 0.1.7 click 8.1.7 click-plugins 1.1.1 cligj 0.7.2 cloudpickle 3.0.0 commot 0.0.3 contextlib2 21.6.0 contourpy 1.2.1 croniter 1.4.1 cycler 0.12.1 dask 2024.7.0 dask-expr 1.1.8 dateutils 0.6.12 decorator 4.4.2 deepdiff 7.0.1 Deprecated 1.2.14 deprecation 2.1.0 distributed 2024.7.0 dm-tree 0.1.8 dnspython 2.6.1 docrep 0.3.2 editor 1.6.6 email_validator 2.2.0 esda 2.4.3 etils 1.9.2 fastapi 0.111.1 fastapi-cli 0.0.4 filelock 3.15.4 fiona 1.9.6 flax 0.8.5 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.6.1 future 1.0.0 gensim 4.3.3 geopandas 0.13.2 giddy 2.3.5 graphtools 1.5.3 h11 0.14.0 h5py 3.11.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 idna 3.7 igraph 0.11.6 importlib_metadata 8.0.0 importlib_resources 6.4.0 inequality 1.0.0 inquirer 3.3.0 itsdangerous 2.2.0 jax 0.4.30 jaxlib 0.4.30 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 karateclub 1.2.2 kiwisolver 1.4.5 legacy-api-wrap 1.4 leidenalg 0.10.2 Levenshtein 0.25.1 libpysal 4.7.0 lightning 2.0.9.post0 lightning-cloud 0.5.70 lightning-utilities 0.11.5 llvmlite 0.43.0 locket 1.0.0 loompy 3.0.7 lz4 4.3.3 mapclassify 2.6.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.1 mdurl 0.1.2 mgwr 2.2.1 ml_collections 0.1.1 ml-dtypes 0.4.0 momepy 0.6.0 mpmath 1.3.0 msgpack 1.0.8 mudata 0.2.4 multidict 6.0.5 multipledispatch 1.0.0 natsort 8.4.0 nest-asyncio 1.6.0 networkx 3.3 numba 0.60.0 numpy 1.26.4 numpy-groupies 0.11.1 numpyro 0.15.1 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 opencv-python 4.10.0.84 opt-einsum 3.3.0 optax 0.2.1 orbax-checkpoint 0.5.21 ordered-set 4.1.0 packaging 24.1 pandas 2.0.3 partd 1.4.2 patsy 0.5.6 phate 1.0.11 pillow 10.4.0 pip 24.1.2 platformdirs 4.2.2 plotly 5.22.0 pointpats 2.4.0 POT 0.9.4 protobuf 5.27.2 psutil 6.0.0 PuLP 2.9.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pydantic 2.1.1 pydantic_core 2.4.0 Pygments 2.18.0 PyGSP 0.5.1 PyJWT 2.8.0 pynndescent 0.5.13 pyparsing 3.0.9 pyproj 3.6.1 pyro-api 0.1.2 pyro-ppl 1.9.1 pysal 24.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-igraph 0.11.6 python-Levenshtein 0.25.1 python-louvain 0.16 python-multipart 0.0.9 pytorch-lightning 2.3.3 pytz 2024.1 PyYAML 6.0.1 quantecon 0.7.2 rapidfuzz 3.9.4 rasterio 1.3.10 rasterstats 0.19.0 readchar 4.1.0 requests 2.32.3 rich 13.7.1 Rtree 1.3.0 runs 1.2.2 s_gd2 1.8.1 s3transfer 0.10.2 scanpy 1.10.2 scikit-learn 1.5.1 scipy 1.13.1 scprep 1.2.3 scvelo 0.3.2 scvi-tools 1.1.5 seaborn 0.13.2 segregation 2.5 session_info 1.0.0 setuptools 71.0.1 shapely 2.0.5 shellingham 1.5.4 simplejson 3.19.2 six 1.16.0 smart-open 7.0.4 sniffio 1.3.1 snuggs 1.4.7 sortedcontainers 2.4.0 soupsieve 2.5 spaghetti 1.7.4 sparse 0.15.4 spglm 1.0.8 spint 1.0.7 splot 1.1.5.post1 spopt 0.5.0 spreg 1.4 spvcm 0.3.0 starlette 0.37.2 starsessions 1.3.0 statsmodels 0.14.1 stdlib-list 0.10.0 sympy 1.13.1 tasklogger 1.2.0 tblib 3.0.0 tenacity 8.5.0 tensorstore 0.1.63 texttable 1.7.0 threadpoolctl 3.5.0 tobler 0.11.2 toml 0.10.2 tomlkit 0.13.0 toolz 0.12.1 torch 2.3.1 torchmetrics 1.4.0.post0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 triton 2.3.1 typer 0.12.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 umap-learn 0.5.6 urllib3 2.2.2 uvicorn 0.30.1 uvloop 0.19.0 watchfiles 0.22.0 wcwidth 0.2.13 websocket-client 1.8.0 websockets 12.0 wheel 0.43.0 wrapt 1.16.0 xarray 2024.6.0 xmltodict 0.13.0 xmod 1.8.1 xyzservices 2024.6.0 yarl 1.9.4 yq 3.4.3 zict 3.0.0 zipp 3.19.2 "],["integrated-scrna-seq-pipeline.html", "3 Integrated scRNA-seq pipeline", " 3 Integrated scRNA-seq pipeline Load the R packages. # sc libraries library(Seurat) library(phateR) library(DoubletFinder) library(monocle) library(slingshot) library(URD) library(GSVA) library(limma) library(plyr) library(dplyr) library(org.Mm.eg.db) library(org.Hs.eg.db) library(CellChat) library(velocyto.R) library(SeuratWrappers) library(stringr) library(scran) library(ggpubr) library(viridis) library(pheatmap) library(parallel) library(reticulate) library(SCENIC) library(feather) library(AUCell) library(RcisTarget) library(Matrix) library(foreach) library(doParallel) library(clusterProfiler) library(OpenXGR) # st libraries library(RColorBrewer) library(Rfast2) library(SeuratDisk) library(abcCellmap) library(biomaRt) library(copykat) library(gelnet) library(ggplot2) library(parallelDist) library(patchwork) library(markdown) # getpot library(getopt) library(tools) # HemaScopeR library(HemaScopeR) Run the integrated scRNA-seq pipeline. scRNASeq_10x_pipeline( # input and output input.data.dirs = c('./SRR7881399/outs/filtered_feature_bc_matrix', './SRR7881400/outs/filtered_feature_bc_matrix', './SRR7881401/outs/filtered_feature_bc_matrix', './SRR7881402/outs/filtered_feature_bc_matrix', './SRR7881403/outs/filtered_feature_bc_matrix', './SRR7881404/outs/filtered_feature_bc_matrix', './SRR7881405/outs/filtered_feature_bc_matrix', './SRR7881406/outs/filtered_feature_bc_matrix', './SRR7881407/outs/filtered_feature_bc_matrix', './SRR7881408/outs/filtered_feature_bc_matrix', './SRR7881409/outs/filtered_feature_bc_matrix', './SRR7881410/outs/filtered_feature_bc_matrix', './SRR7881411/outs/filtered_feature_bc_matrix', './SRR7881412/outs/filtered_feature_bc_matrix', './SRR7881413/outs/filtered_feature_bc_matrix', './SRR7881414/outs/filtered_feature_bc_matrix', './SRR7881415/outs/filtered_feature_bc_matrix', './SRR7881416/outs/filtered_feature_bc_matrix', './SRR7881417/outs/filtered_feature_bc_matrix', './SRR7881418/outs/filtered_feature_bc_matrix', './SRR7881419/outs/filtered_feature_bc_matrix', './SRR7881420/outs/filtered_feature_bc_matrix', './SRR7881421/outs/filtered_feature_bc_matrix', './SRR7881422/outs/filtered_feature_bc_matrix', './SRR7881423/outs/filtered_feature_bc_matrix'), project.names = c( 'SRR7881399', 'SRR7881400', 'SRR7881401', 'SRR7881402', 'SRR7881403', 'SRR7881404', 'SRR7881405', 'SRR7881406', 'SRR7881407', 'SRR7881408', 'SRR7881409', 'SRR7881410', 'SRR7881411', 'SRR7881412', 'SRR7881413', 'SRR7881414', 'SRR7881415', 'SRR7881416', 'SRR7881417', 'SRR7881418', 'SRR7881419', 'SRR7881420', 'SRR7881421', 'SRR7881422', 'SRR7881423'), output.dir = './output/', pythonPath = '/home/anaconda3/envs/HemaScopeR/bin/python', # quality control and preprocessing gene.column = 2, min.cells = 10, min.feature = 200, mt.pattern = '^MT-', nFeature_RNA.limit = 200, percent.mt.limit = 20, scale.factor = 10000, nfeatures = 3000, ndims = 50, vars.to.regress = NULL, PCs = 1:35, resolution = 0.4, n.neighbors = 50, # remove doublets doublet.percentage = 0.04, doublerFinderwraper.PCs = 1:20, doublerFinderwraper.pN = 0.25, doublerFinderwraper.pK = 0.1, # phateR phate.knn = 50, phate.npca = 20, phate.t = 10, phate.ndim = 2, min.pct = 0.25, logfc.threshold = 0.25, # visualization ViolinPlot.cellTypeOrders = as.character(1:22), ViolinPlot.cellTypeColors = NULL, Org = 'hsa', loom.files.path = c( './SRR7881399/velocyto/SRR7881399.loom', './SRR7881400/velocyto/SRR7881400.loom', './SRR7881401/velocyto/SRR7881401.loom', './SRR7881402/velocyto/SRR7881402.loom', './SRR7881403/velocyto/SRR7881403.loom', './SRR7881404/velocyto/SRR7881404.loom', './SRR7881405/velocyto/SRR7881405.loom', './SRR7881406/velocyto/SRR7881406.loom', './SRR7881407/velocyto/SRR7881407.loom', './SRR7881408/velocyto/SRR7881408.loom', './SRR7881409/velocyto/SRR7881409.loom', './SRR7881410/velocyto/SRR7881410.loom', './SRR7881411/velocyto/SRR7881411.loom', './SRR7881412/velocyto/SRR7881412.loom', './SRR7881413/velocyto/SRR7881413.loom', './SRR7881414/velocyto/SRR7881414.loom', './SRR7881415/velocyto/SRR7881415.loom', './SRR7881416/velocyto/SRR7881416.loom', './SRR7881417/velocyto/SRR7881417.loom', './SRR7881418/velocyto/SRR7881418.loom', './SRR7881419/velocyto/SRR7881419.loom', './SRR7881420/velocyto/SRR7881420.loom', './SRR7881421/velocyto/SRR7881421.loom', './SRR7881422/velocyto/SRR7881422.loom', './SRR7881423/velocyto/SRR7881423.loom'), # cell cycle cellcycleCutoff = NULL, # cell chat sorting = FALSE, ncores = 10, # Verbose = FALSE, # activeEachStep Whether_load_previous_results = FALSE, Step1_Input_Data = TRUE, Step1_Input_Data.type = 'cellranger-count', Step2_Quality_Control = TRUE, Step2_Quality_Control.RemoveBatches = TRUE, Step2_Quality_Control.RemoveDoublets = TRUE, Step3_Clustering = TRUE, Step4_Identify_Cell_Types = TRUE, Step4_Use_Which_Labels = 'clustering', Step4_Cluster_Labels = NULL, Step4_Changed_Labels = NULL, Step4_run_sc_CNV = TRUE, Step5_Visualization = TRUE, Step6_Find_DEGs = TRUE, Step7_Assign_Cell_Cycle = TRUE, Step8_Calculate_Heterogeneity = TRUE, Step9_Violin_Plot_for_Marker_Genes = TRUE, Step10_Calculate_Lineage_Scores = TRUE, Step11_GSVA = TRUE, Step11_GSVA.identify.cellType.features=TRUE, Step11_GSVA.identify.diff.features=FALSE, Step11_GSVA.comparison.design=NULL, Step12_Construct_Trajectories = TRUE, Step12_Construct_Trajectories.clusters = c('3','6','9','10','11','14','15','19'), Step12_Construct_Trajectories.monocle = TRUE, Step12_Construct_Trajectories.slingshot = TRUE, Step12_Construct_Trajectories.scVelo = TRUE, Step13_TF_Analysis = TRUE, Step14_Cell_Cell_Interaction = TRUE, Step15_Generate_the_Report = TRUE ) "],["step-by-step-scrna-seq-pipeline.html", "4 Step-by-step scRNA-seq Pipeline 4.1 Step 1. Load the R packages and the input data 4.2 Step 2. Quality Control 4.3 Step 3. Clustering 4.4 Step 4. Identify Cell Types 4.5 Step 5. Visualization 4.6 Step 6. Find DEGs 4.7 Step 7. Assign Cell Cycles 4.8 Step 8. Calculate Heterogeneity 4.9 Step 9. Violin Plot for Marker Genes 4.10 Step 10. Calculate Lineage Scores 4.11 Step 11. GSVA 4.12 Step 12. Construct Trajectories 4.13 Step 13. TF Analysis 4.14 Step 14. Cell-Cell Interaction", " 4 Step-by-step scRNA-seq Pipeline 4.1 Step 1. Load the R packages and the input data Load the R packages. # sc libraries library(Seurat) library(phateR) library(DoubletFinder) library(monocle) library(slingshot) library(URD) library(GSVA) library(limma) library(plyr) library(dplyr) library(org.Mm.eg.db) library(org.Hs.eg.db) library(CellChat) library(velocyto.R) library(SeuratWrappers) library(stringr) library(scran) library(ggpubr) library(viridis) library(pheatmap) library(parallel) library(reticulate) library(SCENIC) library(feather) library(AUCell) library(RcisTarget) library(Matrix) library(foreach) library(doParallel) library(clusterProfiler) library(OpenXGR) # st libraries library(RColorBrewer) library(Rfast2) library(SeuratDisk) library(abcCellmap) library(biomaRt) library(copykat) library(gelnet) library(ggplot2) library(parallelDist) library(patchwork) library(markdown) # getpot library(getopt) library(tools) # HemaScopeR library(HemaScopeR) Set the paths for the input data, the output results, and the Python installation. input.data.dirs = c('./SRR7881399/outs/filtered_feature_bc_matrix', './SRR7881400/outs/filtered_feature_bc_matrix', './SRR7881401/outs/filtered_feature_bc_matrix', './SRR7881402/outs/filtered_feature_bc_matrix', './SRR7881403/outs/filtered_feature_bc_matrix', './SRR7881404/outs/filtered_feature_bc_matrix', './SRR7881405/outs/filtered_feature_bc_matrix', './SRR7881406/outs/filtered_feature_bc_matrix', './SRR7881407/outs/filtered_feature_bc_matrix', './SRR7881408/outs/filtered_feature_bc_matrix', './SRR7881409/outs/filtered_feature_bc_matrix', './SRR7881410/outs/filtered_feature_bc_matrix', './SRR7881411/outs/filtered_feature_bc_matrix', './SRR7881412/outs/filtered_feature_bc_matrix', './SRR7881413/outs/filtered_feature_bc_matrix', './SRR7881414/outs/filtered_feature_bc_matrix', './SRR7881415/outs/filtered_feature_bc_matrix', './SRR7881416/outs/filtered_feature_bc_matrix', './SRR7881417/outs/filtered_feature_bc_matrix', './SRR7881418/outs/filtered_feature_bc_matrix', './SRR7881419/outs/filtered_feature_bc_matrix', './SRR7881420/outs/filtered_feature_bc_matrix', './SRR7881421/outs/filtered_feature_bc_matrix', './SRR7881422/outs/filtered_feature_bc_matrix', './SRR7881423/outs/filtered_feature_bc_matrix') output.dir = './output/' pythonPath = '/home/anaconda3/envs/HemaScopeR/bin/python' Set the parameters for loading the data sets. project.names = c('SRR7881399', 'SRR7881400', 'SRR7881401', 'SRR7881402', 'SRR7881403', 'SRR7881404', 'SRR7881405', 'SRR7881406', 'SRR7881407', 'SRR7881408', 'SRR7881409', 'SRR7881410', 'SRR7881411', 'SRR7881412', 'SRR7881413', 'SRR7881414', 'SRR7881415', 'SRR7881416', 'SRR7881417', 'SRR7881418', 'SRR7881419', 'SRR7881420', 'SRR7881421', 'SRR7881422', 'SRR7881423') gene.column = 2 min.cells = 10 min.feature = 200 mt.pattern = '^MT-' Step1_Input_Data.type = 'cellranger-count' Create folders for saving the results of HemaScopeR analysis. wdir <- getwd() if(is.null(pythonPath)==FALSE){ reticulate::use_python(pythonPath) }else{print('Please set the path of Python.')} if (!file.exists(paste0(output.dir, '/HemaScopeR_results/'))) { dir.create(paste0(output.dir, '/HemaScopeR_results/')) } output.dir <- paste0(output.dir,'/HemaScopeR_results/') if (!file.exists(paste0(output.dir, '/RDSfiles/'))) { dir.create(paste0(output.dir, '/RDSfiles/')) } previous_results_path <- paste0(output.dir, '/RDSfiles/') # if (Whether_load_previous_results) { # print('Loading the previous results...') # Load_previous_results(previous_results_path = previous_results_path) # } # Step1. Input data----------------------------------------------------------------------------- print('Step1. Input data.') if (!file.exists(paste0(output.dir, '/Step1.Input_data/'))) { dir.create(paste0(output.dir, '/Step1.Input_data/')) } Load the data sets. file.copy(from = input.data.dirs, to = paste0(output.dir,'/Step1.Input_data/'), recursive = TRUE) if(Step1_Input_Data.type == 'cellranger-count'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_data.temp <- Read10X(data.dir = input.data.dirs[i], gene.column = gene.column) sc_object.temp <- CreateSeuratObject(counts = sc_data.temp, project = project.names[i], min.cells = min.cells, min.feature = min.feature) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp)} }else{ sc_data <- Read10X(data.dir = input.data.dirs, gene.column = gene.column) sc_object <- CreateSeuratObject(counts = sc_data, project = project.names, min.cells = min.cells, min.feature = min.feature) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else if(Step1_Input_Data.type == 'Seurat'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_object.temp <- readRDS(input.data.dirs[i]) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp) } }else{ sc_object <- readRDS(input.data.dirs) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else if(Step1_Input_Data.type == 'Matrix'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_data.temp <- readRDS(input.data.dirs[i]) sc_object.temp <- CreateSeuratObject(counts = sc_data.temp, project = project.names[i], min.cells = min.cells, min.feature = min.feature) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp)} }else{ sc_data <- readRDS(input.data.dirs) sc_object <- CreateSeuratObject(counts = sc_data, project = project.names, min.cells = min.cells, min.feature = min.feature) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else{ stop('Please input data generated by the cellranger-count software, or a Seurat object, or a gene expression matrix. HemaScopeR does not support other formats of input data.') } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.2 Step 2. Quality Control Set the parameters for quality control. # quality control and preprocessing nFeature_RNA.limit = 200 percent.mt.limit = 20 scale.factor = 10000 nfeatures = 3000 ndims = 50 vars.to.regress = NULL PCs = 1:35 resolution = 0.4 n.neighbors = 50 # remove doublets doublet.percentage = 0.04 doublerFinderwraper.PCs = 1:20 doublerFinderwraper.pN = 0.25 doublerFinderwraper.pK = 0.1 Step2_Quality_Control.RemoveBatches = TRUE Step2_Quality_Control.RemoveDoublets = TRUE Create a folder for saving the results of quality control. print('Step2. Quality control.') if (!file.exists(paste0(output.dir, '/Step2.Quality_control/'))) { dir.create(paste0(output.dir, '/Step2.Quality_control/')) } Run the quality control process. if(length(input.data.dirs) > 1){ # preprocess and quality control for multiple scRNA-Seq data sets sc_object <- QC_multiple_scRNASeq(seuratObjects = input.data.list, datasetID = project.names, output.dir = paste0(output.dir,'/Step2.Quality_control/'), Step2_Quality_Control.RemoveBatches = Step2_Quality_Control.RemoveBatches, Step2_Quality_Control.RemoveDoublets = Step2_Quality_Control.RemoveDoublets, nFeature_RNA.limit = nFeature_RNA.limit, percent.mt.limit = percent.mt.limit, scale.factor = scale.factor, nfeatures = nfeatures, ndims = ndims, vars.to.regress = vars.to.regress, PCs = PCs, resolution = resolution, n.neighbors = n.neighbors, percentage = doublet.percentage, doublerFinderwraper.PCs = doublerFinderwraper.PCs, doublerFinderwraper.pN = doublerFinderwraper.pN, doublerFinderwraper.pK = doublerFinderwraper.pK ) }else{ # preprocess and quality control for single scRNA-Seq data set sc_object <- QC_single_scRNASeq(sc_object = sc_object, datasetID = project.names, output.dir = paste0(output.dir,'/Step2.Quality_control/'), Step2_Quality_Control.RemoveDoublets = Step2_Quality_Control.RemoveDoublets, nFeature_RNA.limit = nFeature_RNA.limit, percent.mt.limit = percent.mt.limit, scale.factor = scale.factor, nfeatures = nfeatures, vars.to.regress = vars.to.regress, ndims = ndims, PCs = PCs, resolution = resolution, n.neighbors = n.neighbors, percentage = doublet.percentage, doublerFinderwraper.PCs = doublerFinderwraper.PCs, doublerFinderwraper.pN = doublerFinderwraper.pN, doublerFinderwraper.pK = doublerFinderwraper.pK) } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.3 Step 3. Clustering Set the parameters for clustering. PCs = 1:35 resolution = 0.4 n.neighbors = 50 Create a folder for saving the results of Louvain clustering. print('Step3. Clustering.') if (!file.exists(paste0(output.dir, '/Step3.Clustering/'))) { dir.create(paste0(output.dir, '/Step3.Clustering/')) } Run Louvian clustering. if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){graph.name <- 'integrated_snn'}else{graph.name <- 'RNA_snn'} sc_object <- FindNeighbors(sc_object, dims = PCs, k.param = n.neighbors, force.recalc = TRUE) sc_object <- FindClusters(sc_object, resolution = resolution, graph.name = graph.name) sc_object@meta.data$seurat_clusters <- as.character(as.numeric(sc_object@meta.data$seurat_clusters)) # plot clustering pdf(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','tsne_cluster.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "tsne", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() pdf(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','umap_cluster.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','tsne_cluster.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "tsne", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','umap_cluster.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "umap", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.4 Step 4. Identify Cell Types Set the path for the database. databasePath = "~/HemaScopeR/database/" Set the parameters for cell type identification. Step4_Use_Which_Labels = 'clustering' Step4_Cluster_Labels = NULL Step4_Changed_Labels = NULL Org = 'hsa' ncores = 10 Create a folder for saving the results of cell type identification. print('Step4. Identify cell types automatically.') if (!file.exists(paste0(output.dir, '/Step4.Identify_Cell_Types/'))) { dir.create(paste0(output.dir, '/Step4.Identify_Cell_Types/')) } Run the cell type identification process and the copy number variation (CNV) analysis. sc_object <- run_cell_annotation(object = sc_object, assay = 'RNA', species = Org, output.dir = paste0(output.dir,'/Step4.Identify_Cell_Types/')) if(Org == 'hsa'){ load(paste0(databasePath,"/HematoMap.reference.rdata")) if(length(intersect(rownames(HematoMap.reference), rownames(sc_object))) < 1000){ HematoMap.reference <- RenameGenesSeurat(obj = HematoMap.reference, newnames = toupper(rownames(HematoMap.reference)), gene.use = rownames(HematoMap.reference), de.assay = "RNA", lassays = "RNA") } if(sc_object@active.assay == 'integrated'){ DefaultAssay(sc_object) <- 'RNA' sc_object <- mapDataToRef(ref_object = HematoMap.reference, ref_labels = HematoMap.reference@meta.data$CellType, query_object = sc_object, PCs = PCs, output.dir = paste0(output.dir, '/Step4.Identify_Cell_Types/')) DefaultAssay(sc_object) <- 'integrated' }else{ sc_object <- mapDataToRef(ref_object = HematoMap.reference, ref_labels = HematoMap.reference@meta.data$CellType, query_object = sc_object, PCs = PCs, output.dir = paste0(output.dir, '/Step4.Identify_Cell_Types/')) } } Set the cell labels. # set the cell labels if(Step4_Use_Which_Labels == 'clustering'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$seurat_clusters Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.1'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$Seurat.RNACluster Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.2'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$scmap.RNACluster Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.3'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$Seurat.Immunophenotype Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.4'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$scmap.Immunophenotype Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'HematoMap'){ if(Org == 'hsa'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$predicted.id Idents(sc_object) <- sc_object@meta.data$selectLabels }else{print("'HematoMap' is only applicable to human data ('Org' = 'hsa').")} }else if(Step4_Use_Which_Labels == 'changeLabels'){ if (!is.null(Step4_Cluster_Labels) && !is.null(Step4_Changed_Labels) && length(Step4_Cluster_Labels) == length(Step4_Changed_Labels)){ sc_object@meta.data$selectLabels <- plyr::mapvalues(sc_object@meta.data$seurat_clusters, from = as.character(Step4_Cluster_Labels), to = as.character(Step4_Changed_Labels), warn_missing = FALSE) Idents(sc_object) <- sc_object@meta.data$selectLabels }else{ print("Please input the 'Step4_Cluster_Labels' parameter as Seurat clustering labels, and the 'Step4_Changed_Labels' parameter as new labels. Please note that these two parameters should be of equal length.") } }else{ print('Please set the "Step4_Use_Which_Labels" parameter as "clustering", "abcCellmap.1", "abcCellmap.2", "HematoMap" or "changeLabels".') } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } Run the CNV analysis. sc_CNV(sc_object=sc_object, save_path=paste0(output.dir,'/Step4.Identify_Cell_Types/'), assay = 'RNA', LOW.DR = 0.05, UP.DR = 0.1, win.size = 25, distance = "euclidean", genome = NULL, n.cores = ncores, species = Org) 4.5 Step 5. Visualization Create a folder for saving the visualization results. print('Step5. Visualization.') if (!file.exists(paste0(output.dir, '/Step5.Visualization/'))) { dir.create(paste0(output.dir, '/Step5.Visualization/')) } The statistical results for the numbers and proportions of cell groups. # statistical results cells_labels <- as.data.frame(cbind(rownames(sc_object@meta.data), as.character(sc_object@meta.data$selectLabels))) colnames(cells_labels) <- c('cell_id', 'cluster_id') cluster_counts <- cells_labels %>% group_by(cluster_id) %>% summarise(count = n()) total_cells <- nrow(cells_labels) cluster_counts <- cluster_counts %>% mutate(proportion = count / total_cells) cluster_counts <- as.data.frame(cluster_counts) cluster_counts$percentages <- scales::percent(cluster_counts$proportion, accuracy = 0.1) cluster_counts <- cluster_counts[,-which(colnames(cluster_counts)=='proportion')] cluster_counts$cluster_id_count_percentages <- paste(cluster_counts$cluster_id, " (", cluster_counts$count, ' cells; ', cluster_counts$percentages, ")", sep='') cluster_counts <- cluster_counts[order(cluster_counts$count, decreasing = TRUE),] cluster_counts <- rbind(cluster_counts, c('Total', sum(cluster_counts$count), '100%', 'all cells')) sc_object@meta.data$cluster_id_count_percentages <- mapvalues(sc_object@meta.data$selectLabels, from=cluster_counts$cluster_id, to=cluster_counts$cluster_id_count_percentages, warn_missing=FALSE) colnames(sc_object@meta.data)[which(colnames(sc_object@meta.data) == 'cluster_id_count_percentages')] <- paste('Total ', nrow(sc_object@meta.data), ' cells', sep='') cluster_counts <- cluster_counts[,-which(colnames(cluster_counts)=='cluster_id_count_percentages')] colnames(cluster_counts) <- c('Cell types', 'Cell counts', 'Percentages') # names(colorvector) <- mapvalues(names(colorvector), # from=cluster_counts$cluster_id, # to=cluster_counts$cluster_id_count_percentages, # warn_missing=FALSE) write.csv(cluster_counts, file=paste(paste0(output.dir, '/Step5.Visualization/'), '/cell types_cell counts_percentages.csv', sep=''), quote=FALSE, row.names=FALSE) The UMAP visualization. pdf(paste(paste0(output.dir, '/Step5.Visualization/'), '/cell types_cell counts_percentages_umap.pdf', sep=''), width = 14, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = paste('Total ', nrow(sc_object@meta.data), ' cells', sep=''), label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Set the parameters for phateR. phate.knn = 50 phate.npca = 20 phate.t = 10 phate.ndim = 2 Run phateR for dimensional reduction and visualization. # run phateR if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object) <- 'integrated' }else{ DefaultAssay(sc_object) <- 'RNA'} if(!is.null(pythonPath)){ run_phateR(sc_object = sc_object, output.dir = paste0(output.dir,'/Step5.Visualization/'), pythonPath = pythonPath, phate.knn = phate.knn, phate.npca = phate.npca, phate.t = phate.t, phate.ndim = phate.ndim) } Perform visualization using UMAP and TSNE. # plot cell types pdf(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','tsne cell types.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "tsne", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() pdf(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','umap cell types.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','tsne cell types.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "tsne", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','umap cell types.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "umap", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.6 Step 6. Find DEGs Set the parameters for identifying differentially expressed genes. min.pct = 0.25 logfc.threshold = 0.25 Create a folder for the DEGs analysis. print('Step6. Find DEGs.') if (!file.exists(paste0(output.dir, '/Step6.Find_DEGs/'))) { dir.create(paste0(output.dir, '/Step6.Find_DEGs/')) } Identify DEGs using Wilcoxon Rank-Sum Test. sc_object.markers <- FindAllMarkers(sc_object, only.pos = TRUE, min.pct = min.pct, logfc.threshold = logfc.threshold) write.csv(sc_object.markers, file = paste0(paste0(output.dir, '/Step6.Find_DEGs/'),'sc_object.markerGenes.csv'), quote=FALSE) Set the parameters for GPTCelltype. your_openai_API_key = '' tissuename = 'human bone marrow' gptmodel = 'gpt-3.5' Use GPTCelltype to assist cell type annotation. GPT_annotation( marker.genes = sc_object.markers, your_openai_API_key = your_openai_API_key, tissuename = tissuename, gptmodel = gptmodel, output.dir = paste0(output.dir, '/Step6.Find_DEGs/')) Perform GO and KEGG enrichment. # GO enrichment if(Org=='mmu'){ OrgDb <- 'org.Mm.eg.db' }else if(Org=='hsa'){ OrgDb <- 'org.Hs.eg.db' }else{ stop("Org should be 'mmu' or 'hsa'.") } HemaScopeREnrichment(DEGs=sc_object.markers, OrgDb=OrgDb, output.dir=paste0(output.dir, '/Step6.Find_DEGs/')) sc_object.markers.top5 <- sc_object.markers %>% group_by(cluster) %>% top_n(n = 5, wt = avg_log2FC) pdf(paste0(paste0(output.dir, '/Step6.Find_DEGs/'), 'sc_object_markerGenesTop5.pdf'), width = 0.5*length(unique(sc_object.markers.top5$gene)), height = 0.5*length(unique(Idents(sc_object)))) print(DotPlot(sc_object, features = unique(sc_object.markers.top5$gene), cols=c("lightgrey",'red'))+theme(axis.text.x =element_text(angle = 45, vjust = 1, hjust = 1))) dev.off() png(paste0(paste0(output.dir, '/Step6.Find_DEGs/'), 'sc_object_markerGenesTop5.png'), width = 20*length(unique(sc_object.markers.top5$gene)), height = 30*length(unique(Idents(sc_object)))) print(DotPlot(sc_object, features = unique(sc_object.markers.top5$gene), cols=c("lightgrey",'red'))+theme(axis.text.x =element_text(angle = 45, vjust = 1, hjust = 1))) dev.off() Create a folder for saving the results of gene network analysis. if (!file.exists(paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/'))) { dir.create(paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/')) } Perform gene network analysis. OpenXGR_SAG(sc_object.markers = sc_object.markers, output.dir = paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/'), subnet.size = 10) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.7 Step 7. Assign Cell Cycles Create a folder for saving the results of cell cycle analysis. print('Step7. Assign cell cycles.') if (!file.exists(paste0(output.dir, '/Step7.Assign_cell_cycles/'))) { dir.create(paste0(output.dir, '/Step7.Assign_cell_cycles/')) } Set the parameters for the cell cycle analysis. cellcycleCutoff = NULL Run the cell cycle analysis. datasets.before.batch.removal <- readRDS(paste0(paste0(output.dir, '/RDSfiles/'),'datasets.before.batch.removal.rds')) sc_object <- cellCycle(sc_object=sc_object, counts_matrix = GetAssayData(object = datasets.before.batch.removal, slot = "counts")%>%as.matrix(), data_matrix = GetAssayData(object = datasets.before.batch.removal, slot = "data")%>%as.matrix(), cellcycleCutoff = cellcycleCutoff, cellTypeOrders = unique(sc_object@meta.data$selectLabels), output.dir=paste0(output.dir, '/Step7.Assign_cell_cycles/'), databasePath = databasePath, Org = Org) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.8 Step 8. Calculate Heterogeneity Create a folder for saving the results of heterogeneity calculation. print('Step8. Calculate heterogeneity.') if (!file.exists(paste0(output.dir, '/Step8.Calculate_heterogeneity/'))) { dir.create(paste0(output.dir, '/Step8.Calculate_heterogeneity/')) } Run heterogeneity calculation process. expression_matrix <- GetAssayData(object = datasets.before.batch.removal, slot = "data")%>%as.matrix() expression_matrix <- expression_matrix[,rownames(sc_object@meta.data)] cell_types_groups <- as.data.frame(cbind(sc_object@meta.data$selectLabels, sc_object@meta.data$datasetID)) colnames(cell_types_groups) <- c('clusters', 'datasetID') if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } heterogeneity(expression_matrix = expression_matrix, cell_types_groups = cell_types_groups, cellTypeOrders = cellTypes_orders, output.dir = paste0(output.dir, '/Step8.Calculate_heterogeneity/')) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.9 Step 9. Violin Plot for Marker Genes Create a folder for saving the violin plots of marker genes. print('Step9. Violin plot for marker genes.') if (!file.exists(paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/'))) { dir.create(paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/')) } Run violin plot visualization. if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object) <- 'integrated' }else{ DefaultAssay(sc_object) <- 'RNA'} dataMatrix <- GetAssayData(object = sc_object, slot = "scale.data") if(is.null(marker.genes)&(Org == 'mmu')){ # mpp genes are from 'The bone marrow microenvironment at single cell resolution' # the other genes are from 'single cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis' # the aliases of these genes were changed in gecodeM16:Gpr64 -> Adgrg2, Sdpr -> Cavin2, Hbb-b1 -> Hbb-bs, Sfpi1 -> Spi1 HSC_lineage_signatures <- c('Slamf1', 'Itga2b', 'Kit', 'Ly6a', 'Bmi1', 'Gata2', 'Hlf', 'Meis1', 'Mpl', 'Mcl1', 'Gfi1', 'Gfi1b', 'Hoxb5') Mpp_genes <- c('Mki67', 'Mpo', 'Elane', 'Ctsg', 'Calr') Erythroid_lineage_signatures <- c('Klf1', 'Gata1', 'Mpl', 'Epor', 'Vwf', 'Zfpm1', 'Fhl1', 'Adgrg2', 'Cavin2','Gypa', 'Tfrc', 'Hbb-bs', 'Hbb-y') Lymphoid_lineage_signatures <- c('Tcf3', 'Ikzf1', 'Notch1', 'Flt3', 'Dntt', 'Btg2', 'Tcf7', 'Rag1', 'Ptprc', 'Ly6a', 'Blnk') Myeloid_lineage_signatures <- c('Gfi1', 'Spi1', 'Mpo', 'Csf2rb', 'Csf1r', 'Gfi1b', 'Hk3', 'Csf2ra', 'Csf3r', 'Sp1', 'Fcgr3') marker.genes <- c(HSC_lineage_signatures, Mpp_genes, Erythroid_lineage_signatures, Lymphoid_lineage_signatures, Myeloid_lineage_signatures) }else if(is.null(marker.genes)&(Org == 'hsa')){ HSPCs_lineage_signatures <- c('CD34','KIT','AVP','FLT3','MME','CD7','CD38','CSF1R','FCGR1A','MPO','ELANE','IL3RA') Myeloids_lineage_signatures <- c('LYZ','CD36','MPO','FCGR1A','CD4','CD14','CD300E','ITGAX','FCGR3A','FLT3','AXL', 'SIGLEC6','CLEC4C','IRF4','LILRA4','IL3RA','IRF8','IRF7','XCR1','CD1C','THBD', 'MRC1','CD34','KIT','ITGA2B','PF4','CD9','ENG','KLF','TFRC') B_cells_lineage_signatures <- c('CD79A','IGLL1','RAG1','RAG2','VPREB1','MME','IL7R','DNTT','MKI67','PCNA','TCL1A','MS4A1','IGHD','CD27','IGHG3') T_NK_cells_lineage_signatures <- c('CD3D','CD3E','CD8A','CCR7','IL7R','SELL','KLRG1','CD27','GNLY', 'NKG7','PDCD1','TNFRSF9','LAG3','CD160','CD4','CD40LG','IL2RA', 'FOXP3','DUSP4','IL2RB','KLRF1','FCGR3A','NCAM1','XCL1','MKI67','PCNA','KLRF') marker.genes <- c(HSPCs_lineage_signatures, Myeloids_lineage_signatures, B_cells_lineage_signatures, T_NK_cells_lineage_signatures) } if(is.null(ViolinPlot.cellTypeOrders)){ ViolinPlot.cellTypeOrders <- unique(sc_object@meta.data$selectLabels) } if(is.null(ViolinPlot.cellTypeColors)){ ViolinPlot.cellTypeColors <- viridis::viridis(length(unique(sc_object@meta.data$selectLabels))) } combinedViolinPlot(dataMatrix = dataMatrix, features = marker.genes, CellTypes = sc_object@meta.data$selectLabels, cellTypeOrders = ViolinPlot.cellTypeOrders, cellTypeColors = ViolinPlot.cellTypeColors, Org = Org, output.dir = paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/'), databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.10 Step 10. Calculate Lineage Scores Create a folder for saving the results of lineage score calculation. print('Step10. Calculate lineage scores.') # we use normalized data here if (!file.exists(paste0(output.dir, '/Step10.Calculate_lineage_scores/'))) { dir.create(paste0(output.dir, '/Step10.Calculate_lineage_scores/')) } Run lineage score calculation. if(is.null(lineage.genelist)&is.null(lineage.names)&(Org == 'mmu')){ lineage.genelist <- c(list(HSC_lineage_signatures), list(Mpp_genes), list(Erythroid_lineage_signatures), list(Lymphoid_lineage_signatures), list(Myeloid_lineage_signatures)) lineage.names <- c('HSC_lineage_signatures', 'Mpp_genes', 'Erythroid_lineage_signatures', 'Lymphoid_lineage_signatures', 'Myeloid_lineage_signatures') }else if(is.null(lineage.genelist)&is.null(lineage.names)&(Org == 'hsa')){ lineage.genelist <- c(list(HSPCs_lineage_signatures), list(Myeloids_lineage_signatures), list(B_cells_lineage_signatures), list(T_NK_cells_lineage_signatures)) lineage.names <- c('HSPCs_lineage_signatures', 'Myeloids_lineage_signatures', 'B_cells_lineage_signatures', 'T_NK_cells_lineage_signatures') } if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } lineageScores(expression_matrix = expression_matrix, cellTypes = sc_object@meta.data$selectLabels, cellTypes_orders = cellTypes_orders, cellTypes_colors = ViolinPlot.cellTypeColors, groups = sc_object@meta.data$datasetID, groups_orders = unique(sc_object@meta.data$datasetID), groups_colors = groups_colors, lineage.genelist = lineage.genelist, lineage.names = lineage.names, Org = Org, output.dir = paste0(output.dir, '/Step10.Calculate_lineage_scores/'), databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.11 Step 11. GSVA Create a folder for saving the results of GSVA. print('Step11. GSVA.') if (!file.exists(paste0(output.dir, '/Step11.GSVA/'))) { dir.create(paste0(output.dir, '/Step11.GSVA/')) } Run GSVA. setwd(wdir) if(Org=='mmu'){ load(paste0(databasePath,"/mouse_c2_v5p2.rdata")) GSVA.genelist <- Mm.c2 assign('OrgDB', org.Mm.eg.db) }else if(Org=='hsa'){ load(paste0(databasePath,"/human_c2_v5p2.rdata")) GSVA.genelist <- Hs.c2 assign('OrgDB', org.Hs.eg.db) }else{ stop("Org should be 'mmu' or 'hsa'.") } if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } run_GSVA(sc_object = sc_object, GSVA.genelist = GSVA.genelist, GSVA.cellTypes = sc_object@meta.data$selectLabels, GSVA.cellTypes.orders = cellTypes_orders, GSVA.cellGroups = sc_object@meta.data$datasetID, GSVA.identify.cellType.features = Step11_GSVA.identify.cellType.features, GSVA.identify.diff.features = Step11_GSVA.identify.diff.features, GSVA.comparison.design = Step11_GSVA.comparison.design, OrgDB = OrgDB, output.dir = paste0(output.dir, '/Step11.GSVA/')) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.12 Step 12. Construct Trajectories Load gene symbols and ensemble IDs. DefaultAssay(sc_object) <- 'RNA' countsSlot <- GetAssayData(object = sc_object, slot = "counts") gene_metadata <- as.data.frame(rownames(countsSlot)) rownames(gene_metadata) <- gene_metadata[,1] if(Org == 'mmu'){ load(paste0(databasePath,"/mouseGeneSymbolandEnsembleID.rdata")) gene_metadata $ ensembleID <- mapvalues(x = gene_metadata[,1], from = mouseGeneSymbolandEnsembleID$geneName, to = mouseGeneSymbolandEnsembleID$ensemblIDNoDot, warn_missing = FALSE) }else if(Org == 'hsa'){ load(paste0(databasePath,"/humanGeneSymbolandEnsembleID.rdata")) gene_metadata $ ensembleID <- mapvalues(x = gene_metadata[,1], from = humanGeneSymbolandEnsembleID$geneName, to = humanGeneSymbolandEnsembleID$ensemblIDNoDot, warn_missing = FALSE) } colnames(gene_metadata) <- c('gene_short_name','ensembleID') Create folders for saving the results of trajectory construction. print('Step12. Construct trajectories.') if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/monocle2/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/monocle2/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/slingshot/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/slingshot/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/scVelo/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/scVelo/')) } Prepare the input data. if(is.null(Step12_Construct_Trajectories.clusters)){ sc_object.subset <- sc_object countsSlot.subset <- GetAssayData(object = sc_object.subset, slot = "counts") }else{ sc_object.subset <- subset(sc_object, subset = selectLabels %in% Step12_Construct_Trajectories.clusters) countsSlot.subset <- GetAssayData(object = sc_object.subset, slot = "counts") } Run monocle2. # monocle2 phenoData <- sc_object.subset@meta.data featureData <- gene_metadata run_monocle(cellData = countsSlot.subset, phenoData = phenoData, featureData = featureData, lowerDetectionLimit = 0.5, expressionFamily = VGAM::negbinomial.size(), cellTypes='selectLabels', monocle.orders=Step12_Construct_Trajectories.clusters, monocle.colors = ViolinPlot.cellTypeColors, output.dir = paste0(output.dir, '/Step12.Construct_trajectories/monocle2/')) Run slingshot. # slingshot if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object.subset) <- 'integrated' }else{ DefaultAssay(sc_object.subset) <- 'RNA'} run_slingshot(slingshot.PCAembeddings = Embeddings(sc_object.subset, reduction = "pca")[, PCs], slingshot.cellTypes = sc_object.subset@meta.data$selectLabels, slingshot.start.clus = slingshot.start.clus, slingshot.end.clus = slingshot.end.clus, slingshot.colors = slingshot.colors, output.dir = paste0(output.dir, '/Step12.Construct_trajectories/slingshot/')) Run scVelo. # scVelo if((!is.null(loom.files.path))&(!is.null(pythonPath))){ prepareDataForScvelo(sc_object = sc_object.subset, loom.files.path = loom.files.path, scvelo.reduction = 'pca', scvelo.column = 'selectLabels', output.dir = paste0(output.dir, '/Step12.Construct_trajectories/scVelo/')) reticulate::py_run_string(paste0("import os\\noutputDir = '", output.dir, "'")) reticulate::py_run_file(file.path(system.file(package = "HemaScopeR"), "python/sc_run_scvelo.py"), convert = FALSE) } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.13 Step 13. TF Analysis Create folders for saving the results of TF analysis. print('Step13. TF analysis.') if (!file.exists(paste0(output.dir, '/Step13.TF_analysis/'))) { dir.create(paste0(output.dir, '/Step13.TF_analysis/')) } Run SCENIC to perform TF analysis. run_SCENIC(countMatrix = countsSlot, cellTypes = sc_object@meta.data$selectLabels, datasetID = sc_object@meta.data$datasetID, cellTypes_colors = Step13_TF_Analysis.cellTypes_colors, cellTypes_orders = unique(sc_object@meta.data$selectLabels), groups_colors = Step13_TF_Analysis.groups_colors, groups_orders = unique(sc_object@meta.data$datasetID), Org = Org, output.dir = paste0(output.dir, '/Step13.TF_analysis/'), pythonPath = pythonPath, databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.14 Step 14. Cell-Cell Interaction Create folders for saving the results of cell-cell interaction analysis. print('Step14. Cell-cell interaction.') if (!file.exists(paste0(output.dir, '/Step14.Cell_cell_interection/'))) { dir.create(paste0(output.dir, '/Step14.Cell_cell_interection/')) } Run CellChat to perform cell-cell interaction analysis. tempwd <- getwd() run_CellChat(data.input=countsSlot, labels = sc_object@meta.data$selectLabels, cell.orders = ViolinPlot.cellTypeOrders, cell.colors = ViolinPlot.cellTypeColors, sample.names = rownames(sc_object@meta.data), Org = Org, sorting = sorting, output.dir = paste0(output.dir, '/Step14.Cell_cell_interection/')) setwd(tempwd) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } "],["stey-by-step-st-seq-pipeline.html", "5 Stey-by-step st-seq pipeline 5.1 Step 1. Data loading 5.2 Step 2. QC 5.3 Step 3. Clustering 5.4 Step 4. DEGs 5.5 Step 5. Spatially variable features 5.6 Step 6. Spatial interaction 5.7 Step 7. CNV analysis 5.8 Step 8. Deconvolution 5.9 Step 9. Cell cycle 5.10 Step 10. Niche analysis", " 5 Stey-by-step st-seq pipeline 5.1 Step 1. Data loading The st_Loading_Data function is designed for loading 10X Visium spatial transcriptomics data from Space Ranger. It will load data from input.data.dir and output it in the SeuratOjbect format. 5.1.1 Function arguments: input.data.dir: The directory where the input data is stored. output.dir: The directory where the processed output will be saved. If not specified, the output is saved in the current working directory. Default is ‘.’. sampleName: A string naming the sample. Default is ‘Hema_ST’. rds.file: A boolean indicating if the input data is in RDS file format rather than a typical results of Space Ranger. Default is FALSE. filename: The name of the file to be loaded if the data is not in RDS format. Default is “filtered_feature_bc_matrix.h5”. assay: The specific assay to apply to the data. Default is ‘Spatial’. slice: The image slice identifier for the spatial data. Default is ‘slice1’. filter.matrix: A boolean indicating whether to load filtered matrix. Default is TRUE. to.upper: A boolean indicating whether to convert feature names to upper form. Default is FALSE. 5.1.2 Funciton behavior: Directory Creation: The function first checks if the output.dir exists; if not, it creates it. RDS File Handling: If rds.file is TRUE, it reads the RDS file, ensuring the specified assay and slice are present in the Seurat object. Non-RDS File Handling: If rds.file is FALSE, it loads the data using Load10X_Spatial from Seurat. Saving the Object: Uses SaveH5Seurat and Convert to save the Seurat object in rds and h5ad formats. File Copying: Copies any necessary files (filter matrix, spatial image) to the output.dir. Return Value: Returns the processed Seurat object. 5.1.3 An example: st_obj <- st_Loading_Data( input.data.dir = 'path/to/data', output.dir = '.', sampleName = 'Hema_ST, rds.file = FALSE, filename = 'filtered_feature_bc_matrix.h5', assay = 'Spatial', slice = 'slice1', filter.matrix = TRUE, to.upper = FALSE ) 5.1.4 Outputs: Spatial transcriptome data in rds and h5ad formats 5.2 Step 2. QC The QC_Spatial function performs basic quality control on a SeuratObject containing 10X visium data and returns the filtered SeuratObject. It provides options to set thresholds for the number of genes, nUMI (unique molecular identifiers), and spots expressing each gene. It also allows for the removal of mitochondrial genes based on species. 5.2.1 Function arguments: st_obj: A SeuratObject of 10X visium data. output.dir: A character string specifying the path to store the results and figures. Default is the current working directory. min.gene: An integer representing the minimum number of genes detected in a spot. Default is 200. max.gene: An integer representing the maximum number of genes detected in a spot. Default is Inf (no upper limit). min.nUMI: An integer representing the minimum number of nUMI detected in a spot. Default is 500. max.nUMI: An integer representing the maximum number of nUMI detected in a spot. Default is Inf (no upper limit). min.spot: An integer representing the minimum number of spots expressing each gene. Default is 3. species: A character string representing the species of sample, either ‘human’ or ‘mouse’. bool.remove.mito: A boolean value indicating whether to remove mitochondrial genes. Default is TRUE. SpatialColors: A function that interpolates a set of given colors to create new color palettes and color ramps. Default is a color palette with reversed Spectral colors from RColorBrewer. 5.2.2 Function behavior: Plots and saves the spatial distribution of nUMI and nGene. Plots and saves violin plots for nUMI and nGene. Identifies and marks low-quality spots based on nUMI and nGene thresholds. Plots the spatial distribution of quality. Plots and saves a histogram for the number of spots expressing each gene. Plots the spatial distribution of mitochondrial genes. Saves the raw SeuratObject before filtering. Removes low-quality spots and genes with fewer occurrences. Optionally removes mitochondrial genes. Saves the filtered SeuratObject. Returns the filtered st_obj. 5.2.3 An example: st_obj <- QC_Spatial( st_obj = st_obj, output.dir = '.', min.gene = 200, min.nUMI = Inf, max.gene = 500, max.nUMI = Inf, min.spot = 3, species = 'human', bool.remove.mito = TRUE, SpatialColors = colorRampPalette(colors = rev(x = brewer.pal(n = 11, name = "Spectral"))) ) 5.2.4 Outputs: Figures showing the spatial distribution of nUMI and nGene. Violin plots of nUMI and nGene. Figures showing the quality. Histograms for the number of spots expressing each gene. Figures showing the spatial distribution of mitochondrial genes. Raw and filtered SeuratObject. 5.3 Step 3. Clustering The st_Clustering function is designed to perform clustering analysis on spatial transcriptomics data. It integrates several key steps including data normalization, dimensionality reduction, clustering, and visualization. The function saves the results and visualizations to output.dir. 5.3.1 Function arguments: st_obj: The input spatial transcriptomics seurat object that contains the data to be clustered. output.dir: The directory where the output files will be saved. Default is the current directory (‘.’). normalization.method: The method used for data normalization. Default is ‘SCTransform’. npcs: The number of principal components to use in PCA. Default is 50. pcs.used: The principal components to use for clustering. Default is the first 10 PCs (1:10). resolution: The resolution parameter for the clustering algorithm. Default is 0.8. verbose: A logical flag to print progress messages. Default is FALSE. 5.3.2 Function behavior: Data Normalization and PCA: Depending on the normalization.method, the function either uses SCTransform or a standard normalization method followed by scaling and variable feature detection. Performs PCA on the normalized data. Clustering and Dimensionality Reduction: Finds nearest neighbors using the specified principal components (pcs.used). Identifies clusters using the specified resolution. Performs UMAP and t-SNE for visualization of the clusters. Visualization: Generates spatial, UMAP, and t-SNE plots of the clusters with customized color schemes. Saves these plots as images in the specified directory. Saving Results: Saves the updated st_obj as an RDS file. Exports the metadata of st_obj to a CSV file. Return Value: Returns the updated st_obj containing the clustering results. 5.3.3 An example: st_obj <- st_Clustering( st_obj = st_obj, output.dir = '.', normalization.method = 'SCTransform', npcs = 50, pcs.used = 1:10, resolution = 0.8, verbose = FALSE ) 5.3.4 Outputs: Figures showing the results of clustering. SeuratObject in rds format. 5.4 Step 4. DEGs The st_Find_DEGs function is designed to identify differentially expressed genes (DEGs) in spatial transcriptomics data. It performs differential expression analysis based on clustering results, visualizes the top markers, and saves the results to output.dir. 5.4.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for DEG analysis. output.dir: The directory where output files will be saved. Default is the current directory (‘.’). ident.label: The metadata label used for identifying clusters. Default is 'seurat_clusters'. only.pos: A logical flag to include only positive markers. Default is TRUE. min.pct: The minimum fraction of cells expressing the gene in either cluster. Default is 0.25. logfc.threshold: The log fold change threshold for considering a gene differentially expressed. Default is 0.25. test.use: The statistical test to use for differential expression analysis. Default is 'wilcox'. verbose: A logical flag to print progress messages. Default is FALSE. 5.4.2 Function behavior: Set Identifiers: Sets the cluster identifiers in the spatial transcriptomics object (st_obj) based on the specified ident.label. Find Differentially Expressed Genes (DEGs): Performs differential expression analysis using the specified parameters (only.pos, min.pct, logfc.threshold, test.use). Top Marker Genes: Selects the top 5 marker genes for each cluster based on the highest average log fold change. Visualization: Generates a dot plot for the top DEGs and saves the plot as an image in the specified directory. Saving Results: Saves the DEG results as a CSV file. Return Value: Returns the data frame containing the identified DEGs. 5.4.3 An example: st.markers <- st_Find_DEGs( st_obj = st_obj, output.dir = '.', ident.label = 'seurat_clusters', only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, test.use = 'wilcox', verbose = FALSE ) 5.4.4 Outputs: Dot plots showing markers. CSV file containing the information of markers. 5.5 Step 5. Spatially variable features The st_SpatiallyVariableFeatures function identifies and visualizes spatially variable features (SVFs) in spatial transcriptomics data. It integrates the identification of spatially variable features using a specified method, saves the results to a directory, and creates visualizations of the top spatially variable features. 5.5.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for analysis. output.dir: The directory where output files will be saved. Default is the current directory. assay: The assay to be used for finding spatially variable features. Default is 'SCT'. selection.method: The method used for selecting spatially variable features. Default is 'moransi'. n.top.show: The number of top spatially variable features to visualize. Default is 10. n.col: The number of columns for the visualization grid. Default is 5. verbose: A logical flag to print progress messages. Default is FALSE. 5.5.2 Function behavior: Identify Spatially Variable Features: Identifies spatially variable features using the specified method and assay. Suppresses warnings during the process. Save Metadata: Extracts metadata features and saves them as a CSV file in output.dir. Visualization: Selects the top n.top.show spatially variable features. Generates and saves a spatial feature plot of these features in the specified directory. Return Value: Returns the updated st_obj containing the identified spatially variable features. 5.5.3 An example: st_obj <- st_SpatiallyVariableFeatures( st_obj = st_obj, output.dir = '.', assay = st_obj@active.assay, selection.method = 'moransi', n.top.show = 10, n.col = 5, verbose = FALSE ) 5.5.4 Outputs: Figures showing SVFs. CSV file containing the information of SVFs. 5.6 Step 6. Spatial interaction The st_Interaction function is used to identify and visualize interactions between clusters based on spatial transcriptomics data. It utilizes Commot to analyze spatial interactions, identify pathway activities, and assess the strength and significance of interactions. 5.6.1 Function arguments: st_data_path: Path to the spatial transcriptomics data. metadata_path: Path to the metadata associated with the spatial transcriptomics data. library_id: Identifier for the spatial transcriptomics library. Default is 'Hema_ST'. label_key: Key in the metadata to identify cell clusters. Default is 'seurat_clusters'. save_path: The directory where output files will be saved. Default is the current directory. species: The species of the spatial transcriptomics data. Default is 'human'. signaling_type: Type of signaling interactions to consider. Default is 'Secreted Signaling'. database: Database to be used for the analysis. Default is 'CellChat'. min_cell_pct: Minimum percentage of cells to consider for interaction analysis. Default is 0.05. dis_thr: Distance threshold for defining interactions. Default is 500. n_permutations: Number of permutations for assessing significance. Default is 100. pythonPath: The path to the Python environment containing Commot to use for the analysis. Default is ‘.’. 5.6.2 Function behavior: Commot Analysis: Uses Commot to perform interaction analysis, identifying interactions within and between clusters. Visualization: Generates visualizations of pathway interactions and interactions between ligand-receptors (LRs) within and between clusters, and saves them in save_path. 5.6.3 An example: st_Interaction( st_data_path = 'path/to/data', metadata_path = 'path/to/metadata', library_id = 'Hema_ST', label_key = 'seurat_clusters', save_path = '.', species = 'human', signaling_type = 'Secreted Signaling', database = 'CellChat', min_cell_pct = 0.05, dis_thr = 500, n_permutations = 100, pythonPath = 'path/to/python' ) 5.6.4 Outputs: Dot plot showing pathway interaction between and within clusters. Dot plot showing LRs interaction between and within clusters. The information of each LR and pathway. 5.7 Step 7. CNV analysis The st_CNV function identifies and visualizes copy number variations (CNVs) in spatial transcriptomics data. It uses CopyKAT to perform the CNV analysis, saves the results, and generates visual representations of CNV states. 5.7.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for analysis. save_path: The directory where output files will be saved. assay: The assay to be used for CNV analysis. Default is 'Spatial'. LOW.DR: The lower threshold for the dropout rate in CopyKAT. Default is 0.05. UP.DR: The upper threshold for the dropout rate in CopyKAT. Default is 0.1. win.size: The window size for the CNV analysis. Default is 25. distance: The distance metric to be used for the analysis. Default is \"euclidean\". genome: The genome version to be used, ‘hg20’ or ‘mm10’. Default is \"hg20\". n.cores: The number of cores to be used for parallel processing. Default is 1. species: The species of the spatial transcriptomics data. Default is 'human'. 5.7.2 Function behavior: CopyKAT Analysis: Runs CopyKAT pipeline to perform CNV analysis using the provided parameters. Saving Results: Saves the CopyKAT results as an RDS file. Plotting: Generates plots of the CNV states and saves them in save_path. Updating Metadata: Updates the spatial transcriptomics object with CNV state metadata. Return Value: Returns the updated st_obj containing the CNV state information. 5.7.3 An example: st_obj <- st_CNV( st_obj = st_obj, save_path = '.', assay = 'Spatial', LOW.DR = 0.05, UP.DR = 0.1, win.size = 25, distance = "euclidean", genome = 'hg20', n.cores = 1, species = 'human' ) 5.7.4 Outputs: Figures showing the predicted CNV states. Figures showing the CNV heatmap. rds files of results of copykat. 5.8 Step 8. Deconvolution The st_Deconvolution function aims to perform spatial deconvolution analysis on spatial transcriptomics data to estimate the cell-type composition and abundance in different regions. The function utilizes cell2location to infer cell-type abundance and spatial distributions, allowing for the visualization and interpretation of spatially resolved cell populations within the tissue. 5.8.1 Function arguments: st.data.dir: Path to the spatial transcriptomics data. sc.h5ad.dir: Path to the single-cell RNA-seq data in h5ad format. Default is NULL. library_id: Identifier for the spatial transcriptomics library. Default is 'Hema_ST'. st_obj: Spatial transcriptomics object containing the data for analysis. Default is NULL. save_path: The directory where output files will be saved. Default is NULL. sc.labels.key: Key in the single-cell metadata to identify cell clusters. Default is 'seurat_clusters'. species: The species of the spatial transcriptomics data. Default is 'mouse'. sc.max.epoch: Maximum number of epochs used for single-cell deconvolution. Default is 1000. st.max.epoch: Maximum number of epochs used for spatial deconvolution. Default is 10000. use.gpu: Logical value indicating whether to use GPU for computation. Default is FALSE. use.Dataset: The dataset to be used for analysis, such as 'HematoMap' or 'LymphNode'. pythonPath: The path to the Python environment containing cell2location to use for the analysis. Default is ‘.’. 5.8.2 Function behavior: Deconvolution Analysis: Performs the spatial deconvolution analysis using the provided spatial transcriptomics and single-cell RNA-seq data. Post-Analysis Processing: Processes the deconvolution results and visualizes the spatial distribution of inferred cell types within the tissue. Returning Results: If a Seurat object is provided, the updated Seurat object with cell type information is returned. 5.8.3 An example: st_obj <- st_Deconvolution( st.data.dir = 'path/to/data', library_id = 'Hema_ST', sc.h5ad.dir = NULL, st_obj = st_obj, save_path = '.', sc.labels.key = 'seurat_clusters', species = 'human', sc.max.epoch = 1000, st.max.epoch = 10000, use.gpu = FALSE, use.Dataset = 'LymphNode', pythonPath = 'path/to/python' ) 5.8.4 Outputs: Figures showing the predicted abundance of each cell-type. The parameters of trained cell2location model. 5.9 Step 9. Cell cycle The st_Cell_cycle function is used to assess the cell cycle phase scores in spatial transcriptomics data. It calculates S phase and G2M phase scores based on the expression of designated cell cycle-related genes and visualizes these scores in spatial and dimensionality-reduced plots. 5.9.1 Function arguments: st_obj: The input Seurat object containing the data for analysis. save_path: The directory where the output images will be saved. Default is the current directory. s.features: A list of genes associated with the S phase. Default is NULL (using genes from Seurat). g2m.features: A list of genes associated with the G2M phase. Default is NULL (using genes from Seurat). species: The species of the spatial transcriptomics data. Default is 'human'. FeatureColors.bi: A color palette for visualization. Default is a two-color ramp palette. 5.9.2 Function behavior: Gene Feature Assignment: Assigns S phase and G2M phase gene lists based on the specified species or provided input. Cell Cycle Scoring: Calculates the S phase and G2M phase scores in the data. Spatial Visualization: Generates spatial feature plots to visualize the S phase and G2M phase scores using the specified color palette and saves the plots as images. Dimensionality-Reduced Plot Visualization: If UMAP or tSNE dimensionality reduction is available in the st_obj, feature plots of the S phase and G2M phase scores are generated in the reduced space and saved as images. Return Value: Returns the updated st_obj containing the cell cycle phase scores. 5.9.3 An example: st_obj <- st_Cell_cycle( st_obj = st_obj, save_path = '.', s.features = NULL, g2m.features = NULL, species = 'human', FeatureColors.bi = colorRampPalette(colors = rev(x = brewer.pal(n = 11, name = 'RdYlBu'))) ) 5.9.4 Outputs: Figures showing S scores. Figures showing S scores. 5.10 Step 10. Niche analysis The st_NicheAnalysis function is designed to perform niche analysis on spatial transcriptomics data, enabling the exploration of spatial niches or microenvironments within the tissue. The function encompasses co-occurrence analysis, niche clustering, and niche interaction analysis to uncover the spatial relationships and characteristics of different cell populations or features. 5.10.1 Function arguments: st_obj: The input SeuratObject containing the spatial transcriptomics data for analysis. features: A vector of features representing features (for example, cell types from deconvolution) for niche analysis. save_path: The directory where the analysis results and visualizations will be saved. Default is the current directory. coexistence.method: The method for co-occurrence analysis, accepting 'correlation' or 'Wasserstein'. Default is 'correlation'. kmeans.n: The number of clusters for niche clustering. Default is 4. st_data_path: A path containing the ‘spatial’ file and ‘filtered_feature_bc_matrix.h5’ file, required for niche interaction visualization. slice: The slice to be used for analysis. Default is 'slice1'. species: The species of the sample data. Default is 'mouse'. pythonPath: The path to the Python environment containing Commot to use for the analysis. Default is ‘.’. 5.10.2 Function behavior: Co-occurrence Score Calculation: Calculates the co-occurrence scores between the specified features using the chosen coexistence method (‘correlation’ or ‘Wasserstein’). Niche Clustering: Utilizes k-means clustering to identify distinct spatial niches based on the expression profiles of the selected features and visualizes the clustering results. Niche Interaction Visualization: If the st_data_path is provided, performs niche interaction visualization using Commot, which is based on the provided spatial transcriptomics data and generates visualizations of niche interactions within the tissue. Return Value: Returns the updated st_obj with niche analysis results and visualizations. 5.10.3 An example: tmp <- read.csv('path/to/cell2loc_res.csv', row.names = 1) features <- colnames(tmp) if(!all(features %in% names(st_obj@meta.data))){ common.barcodes <- intersect(colnames(st_obj), rownames(tmp)) tmp <- tmp[common.barcodes, ] st_obj <- st_obj[, common.barcodes] st_obj <- AddMetaData(st_obj, metadata = tmp) } st_obj <- st_NicheAnalysis( st_obj, features = features, save_path = '.', coexistence.method = 'correlation', kmeans.n = 4, st_data_path = 'path/to/data', slice = `slice1`, species = 'human', condaenv = 'path/to/python' ) 5.10.4 Outputs: Figures showing the co-existence results. Figures showing the spatial distribution of each niche. Figures showing the composition of each niche. Figures showing the results of interactions using Commot. "]] +[["index.html", "HemaScope Tutorial 1 Introduction", " HemaScope Tutorial HemaScope team 2024-09-27 1 Introduction HemaScope is a specialized bioinformatics toolkit designed for analyzing both single-cell and spatial transcriptome sequencing data from hematopoietic cells, including myeloid and lymphoid lineages. We have developed an R package named HemaScopeR, a Shiny interface named HemaScopeShiny, and a cloud platform named HemaScopeCloud. This tutorial introduces how to install and use the R package and Shiny interface, as well as how to access and operate the cloud platform. "],["installation.html", "2 Installation 2.1 Create a new conda environment and activate it 2.2 Set the channels in conda 2.3 Install R and python 2.4 Install required R-packages 2.5 Install required Python-packages 2.6 The installed packages with versions", " 2 Installation 2.1 Create a new conda environment and activate it conda create --name HemaScope_env conda activate HemaScope_env 2.2 Set the channels in conda # Add the default channel conda config --add channels defaults # Add default channel URLs conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 # Add custom channels conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch-lts conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/simpleitk conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/deepmodeling # Set to show channel URLs conda config --set show_channel_urls true 2.3 Install R and python R 4.3.3 and python 3.8.19 conda install R-base=4.3.3 conda install python=3.8.19 2.4 Install required R-packages From conda conda install -c conda-forge r-devtools=2.4.5 conda install -c conda-forge r-Seurat=4.3.0.1 conda install -c conda-forge r-Rfast2=0.1.5.1 conda install -c conda-forge r-hdf5r=1.3.10 conda install -c conda-forge r-ggpubr=0.6.0 conda install pwwang::r-seuratwrappers conda install -c bioconda bioconductor-monocle=2.28.0 conda install -c bioconda bioconductor-slingshot=2.8.0 conda install -c bioconda bioconductor-GSVA=1.48.2 conda install -c bioconda bioconductor-org.Mm.eg.db=3.17.0 conda install -c bioconda bioconductor-org.Hs.eg.db=3.17.0 conda install -c bioconda bioconductor-scran=1.28.1 conda install -c bioconda bioconductor-AUCell=1.22.0 conda install -c bioconda bioconductor-RcisTarget=1.20.0 conda install -c bioconda bioconductor-GENIE3=1.24.0 conda install -c bioconda bioconductor-biomaRt=2.56.1 conda install -c bioconda r-velocyto.r=0.6 #conda install -c bioconda bioconductor-limma=3.56.2 Enter the R language environment We suggest users do not manually update any already installed R packages during the installation of the following R packages. R From BiocManager # BiocManager(version = "1.30.23") should already be installed as a dependency of r-seuratwrappers. # If it is not installed, please run the following code to install it. # install.packages("BiocManager",version="1.30.23") BiocManager::install("ComplexHeatmap") BiocManager::install("scmap") BiocManager::install("clusterProfiler") install.packages("doMC") install.packages("doRNG") From CRAN remotes::install_version("shinyjs", version = "2.1.0") remotes::install_version("shiny", version = "1.8.0") remotes::install_version("shinyWidgets", version = "0.8.6") remotes::install_version("shinydashboard", version = "0.7.2") remotes::install_version("slickR", version = "0.6.0") remotes::install_version("phateR", version = "1.0.7") remotes::install_version("gelnet", version = "1.2.1") remotes::install_version("parallelDist", version = "0.2.6") remotes::install_version("kableExtra", version = "1.3.4") remotes::install_version("transport", version = "0.14-6") remotes::install_version("feather", version = "0.3.5") remotes::install_version("markdown", version = "1.13") From GitHub tips: Sometimes network connection issues may occur, resulting in an error message indicating that GitHub cannot be connected. Please try installing again when the network conditions improve. Usage limitations: Sometimes an API rate limit error occurs, and a GitHub token is needed to provide the GitHub API rate limit. The steps to resolve this are as follows: Register for an account or log in to an existing account on the GitHub website. Then click on your profile picture in the top right corner, go to the dropdown menu and select “Settings.” Next, find “Developer settings” and click on it, then find “Personal access tokens (classic).” Click on it, then click “Create new token (classic).” Create a new token by first naming it anything you like. Then choose the expiration time for the token. Finally, check the “repo” box; the token will be used to download code repositories from GitHub. Click “Generate token.” Copy the generated token password. After that, set the token in the environment variable in R. Since we are using conda, enter R by typing R in the terminal. Then, enter the command: usethis::edit_r_environ(). This will open a file. Press the i key to edit. Paste the token you copied into the code area as follows: GITHUB_TOKEN=“your_token”. Then press Esc, type :wq! (force save). After that, you need to exit Linux and re-enter R. Close and reopen the terminal to apply the environment variable. Reopen Linux, activate the conda environment, and enter R again. devtools::install_github("sqjin/CellChat") devtools::install_github("immunogenomics/presto") devtools::install_github("aertslab/SCENIC@fde9774") devtools::install_github("pzhulab/abcCellmap@f44c14b") devtools::install_github("navinlabcode/copykat@d7d6569") devtools::install_github('chris-mcginnis-ucsf/DoubletFinder@8c7f76e') devtools::install_github("mojaveazure/seurat-disk@877d4e1") Install HemaScopeR from github devtools::install_github(repo="ZhenyiWangTHU/HemaScopeR", dep = FALSE) Exist the R language environment quit() 2.5 Install required Python-packages Upgrade pip and set mirrors python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade pip pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple pip config set global.extra-index-url http://mirrors.aliyun.com/pypi/simple/ Install required packages pip install stereopy==1.3.1 anndata==0.9.2 arboreto==0.1.6 cell2location==0.1.3 commot==0.0.3 karateclub==1.2.2 matplotlib==3.7.1 networkx==3.1 numpy==1.23.5 pandas==1.5.3 phate==1.0.11 pot==0.9.1 scanpy==1.9.6 scipy==1.10.1 scvelo==0.3.2 scvi-tools==0.20.3 seaborn==0.12.2 distributed==2024.2.1 dask-expr==0.5.3 2.6 The installed packages with versions R packages with versions Package Version ------- ------- abcCellmap 0.1.0 abind 1.4-5 annotate 1.78.0 AnnotationDbi 1.64.1 ape 5.8 aplot 0.2.3 arrow 17.0.0 askpass 1.2.0 assertthat 0.2.1 AUCell 1.22.0 backports 1.5.0 base 4.3.3 base64enc 0.1-3 beachmat 2.16.0 BH 1.84.0-0 Biobase 2.60.0 BiocFileCache 2.8.0 BiocGenerics 0.46.0 BiocManager 1.30.23 BiocNeighbors 1.18.0 BiocParallel 1.34.2 BiocSingular 1.16.0 BiocVersion 3.18.1 biocViews 1.68.1 biomaRt 2.56.1 Biostrings 2.68.1 bit 4.0.5 bit64 4.0.5 bitops 1.0-7 blob 1.2.4 bluster 1.10.0 boot 1.3-30 brew 1.0-10 brio 1.1.5 broom 1.0.6 bslib 0.7.0 cachem 1.1.0 callr 3.7.6 car 3.1-2 carData 3.0-5 caret 6.0-94 caTools 1.18.2 CellChat 2.0.1 cellranger 1.1.0 circlize 0.4.16 class 7.3-22 cli 3.6.3 clipr 0.8.0 clock 0.7.0 clue 0.3-65 cluster 2.1.6 clusterProfiler 4.10.1 coda 0.19-4.1 codetools 0.2-20 colorspace 2.1-0 combinat 0.0-8 commonmark 1.9.1 compiler 4.3.3 ComplexHeatmap 2.18.0 conquer 1.3.3 copykat 1.1.0 corrplot 0.92 cowplot 1.1.3 cpp11 0.4.7 crayon 1.5.3 credentials 2.0.1 crosstalk 1.2.1 curl 5.2.1 data.table 1.15.4 datasets 4.3.3 DBI 1.2.3 dbplyr 2.5.0 DDRTree 0.1.5 DelayedArray 0.26.6 DelayedMatrixStats 1.22.1 deldir 2.0-4 Deriv 4.1.3 desc 1.4.3 devtools 2.4.5 diagram 1.6.5 diffobj 0.3.5 digest 0.6.36 dlm 1.1-6 doMC 1.3.8 doRNG 1.8.6 doBy 4.6.22 docopt 0.7.1 doParallel 1.0.17 DOSE 3.28.2 dotCall64 1.1-1 DoubletFinder 2.0.3 downlit 0.4.4 downloader 0.4 dplyr 1.1.4 dqrng 0.3.2 dynamicTreeCut 1.63-1 e1071 1.7-14 edgeR 3.42.4 ellipsis 0.3.2 enrichplot 1.22.0 evaluate 0.24.0 expm 0.999-9 fansi 1.0.6 farver 2.1.2 fastDummies 1.7.3 fastICA 1.2-4 fastmap 1.2.0 fastmatch 1.1-4 feather 0.3.5 fgsea 1.28.0 fields 16.2 filelock 1.0.3 fitdistrplus 1.1-11 FNN 1.1.4 fontawesome 0.5.2 forcats 1.0.0 foreach 1.5.2 foreign 0.8-87 formatR 1.14 fs 1.6.4 futile.logger 1.4.3 futile.options 1.0.1 future 1.33.2 future.apply 1.11.2 gelnet 1.2.1 generics 0.1.3 GENIE3 1.24.0 GenomeInfoDb 1.36.1 GenomeInfoDbData 1.2.11 GenomicRanges 1.52.0 gert 2.0.1 GetoptLong 1.0.5 ggalluvial 0.12.5 ggforce 0.4.2 ggfun 0.1.5 ggnetwork 0.5.13 ggnewscale 0.4.10 ggplot2 3.5.1 ggplotify 0.1.2 ggpubr 0.6.0 ggraph 2.2.1 ggrepel 0.9.5 ggridges 0.5.6 ggsci 3.2.0 ggsignif 0.6.4 ggtree 3.10.1 gh 1.4.1 gitcreds 0.1.2 GlobalOptions 0.1.2 globals 0.16.3 glue 1.7.0 GO.db 3.18.0 goftest 1.2-3 googleVis 0.7.3 GOSemSim 2.28.1 gower 1.0.1 gplots 3.1.3.1 graph 1.78.0 graphics 4.3.3 graphlayouts 1.1.1 grDevices 4.3.3 grid 4.3.3 gridBase 0.4-7 gridExtra 2.3 gridGraphics 0.5-1 GSEABase 1.62.0 gson 0.1.0 GSVA 1.48.2 gtable 0.3.5 gtools 3.9.5 hardhat 1.4.0 haven 2.5.4 HDF5Array 1.28.1 hdf5r 1.3.10 HDO.db 0.99.1 HemaScopeR 1.0.0 here 1.0.1 hexbin 1.28.3 highr 0.11 hms 1.1.3 HSMMSingleCell 1.20.0 htmltools 0.5.8.1 htmlwidgets 1.6.4 httpuv 1.6.15 httr 1.4.7 httr2 1.0.2 ica 1.0-3 igraph 2.0.3 ini 0.3.1 ipred 0.9-14 IRanges 2.34.1 irlba 2.3.5.1 isoband 0.2.7 iterators 1.0.14 jquerylib 0.1.4 jsonlite 1.8.8 kableExtra 1.3.4 KEGGREST 1.40.0 kernlab 0.9-32 KernSmooth 2.23-24 knitr 1.48 labeling 0.4.3 lambda.r 1.2.4 later 1.3.2 lattice 0.22-6 lava 1.7.3 lazyeval 0.2.2 leiden 0.4.3.1 leidenbase 0.1.27 lifecycle 1.0.4 limma 3.56.2 listenv 0.9.1 lme4 1.1-35.5 lmtest 0.9-40 locfit 1.5-9.9 lsei 1.3-0 lubridate 1.9.3 magrittr 2.0.3 maps 3.4.2 maptools 1.1-8 markdown 1.13 MASS 7.3-60.0.1 Matrix 1.6-5 MatrixGenerics 1.12.2 MatrixModels 0.5-3 matrixStats 1.3.0 mcmc 0.9-8 MCMCpack 1.7-0 memoise 2.0.1 metapod 1.8.0 methods 4.3.3 mgcv 1.9-1 microbenchmark 1.4.10 mime 0.12 miniUI 0.1.1.1 minqa 1.2.7 mixtools 2.0.0 ModelMetrics 1.2.2.2 modelr 0.1.11 monocle 2.28.0 munsell 0.5.1 network 1.18.2 nlme 3.1-165 nloptr 2.0.3 NMF 0.27 nnet 7.3-19 npsurv 0.5-0 numDeriv 2016.8-1.1 openssl 2.2.0 org.Hs.eg.db 3.17.0 org.Mm.eg.db 3.17.0 parallel 4.3.3 parallelDist 0.2.6 parallelly 1.37.1 patchwork 1.2.0 pbapply 1.7-2 pbkrtest 0.5.2 pcaMethods 1.92.0 phateR 1.0.7 pheatmap 1.0.12 pillar 1.9.0 pkgbuild 1.4.4 pkgconfig 2.0.3 pkgdown 2.1.0 pkgload 1.3.4 plogr 0.2.0 plotly 4.10.4 plyr 1.8.9 png 0.1-8 polyclip 1.10-6 polynom 1.4-1 praise 1.0.0 presto 1.0.0 prettyunits 1.2.0 princurve 2.1.6 pROC 1.18.5 processx 3.8.4 prodlim 2024.06.25 profvis 0.3.8 progress 1.2.3 progressr 0.14.0 promises 1.3.0 proxy 0.4-27 ps 1.7.7 purrr 1.0.2 qlcMatrix 0.9.8 quantreg 5.98 qvalue 2.34.0 R.methodsS3 1.8.2 R.oo 1.26.0 R.utils 2.12.3 R6 2.5.1 ragg 1.3.2 randomForest 4.7-1.1 RANN 2.6.1 rappdirs 0.3.3 RBGL 1.76.0 RcisTarget 1.20.0 rcmdcheck 1.4.0 RColorBrewer 1.1-3 Rcpp 1.0.13 RcppAnnoy 0.0.22 RcppArmadillo 14.0.0-1 RcppEigen 0.3.4.0.0 RcppGSL 0.3.13 RcppHNSW 0.6.0 RcppParallel 5.1.6 RcppProgress 0.4.2 RcppTOML 0.2.2 RcppZiggurat 0.1.6 RCurl 1.98-1.16 readr 2.1.5 readxl 1.4.3 recipes 1.1.0 registry 0.5-1 rematch 2.0.0 rematch2 2.1.2 remotes 2.5.0 reshape2 1.4.4 reticulate 1.38.0 Rfast 2.1.0 Rfast2 0.1.5.1 rhdf5 2.44.0 rhdf5filters 1.12.1 Rhdf5lib 1.22.0 rio 1.1.1 rjson 0.2.21 rlang 1.1.4 rmarkdown 2.27 rngtools 1.5.2 ROCR 1.0-11 roxygen2 7.3.2 rpart 4.1.23 rprojroot 2.0.4 RSpectra 0.16-2 RSQLite 2.3.7 rstatix 0.7.2 rstudioapi 0.16.0 rsvd 1.0.5 Rtsne 0.17 RUnit 0.4.33 rversions 2.1.2 rvest 1.0.4 S4Arrays 1.0.4 S4Vectors 0.38.1 sass 0.4.9 ScaledMatrix 1.8.1 scales 1.3.0 scattermore 1.2 scatterpie 0.2.3 SCENIC 1.3.0 scmap 1.24.0 scran 1.28.1 sctransform 0.4.1 scuttle 1.10.1 segmented 2.1-0 selectr 0.4-2 sessioninfo 1.2.2 Seurat 4.3.0.1 SeuratDisk 0.0.0.9021 SeuratObject 5.0.2 SeuratWrappers 0.3.1 shadowtext 0.1.4 shape 1.4.6.1 shinyjs 2.1.0 shiny 1.8.0 shinyWidgets 0.8.6 shinydashboard 0.7.2 slickR 0.6.0 SingleCellExperiment 1.22.0 sitmo 2.0.2 slam 0.1-51 slingshot 2.8.0 sna 2.7-2 snow 0.4-4 sourcetools 0.1.7-1 sp 2.1-4 spam 2.10-0 SparseM 1.84 sparseMatrixStats 1.12.2 sparsesvd 0.2-2 spatstat.data 3.1-2 spatstat.explore 3.2-6 spatstat.geom 3.2-9 spatstat.random 3.2-3 spatstat.sparse 3.1-0 spatstat.univar 3.0-0 spatstat.utils 3.0-5 splines 4.3.3 SQUAREM 2021.1 statmod 1.5.0 statnet.common 4.9.0 stats 4.3.3 stats4 4.3.3 stringi 1.8.4 stringr 1.5.1 SummarizedExperiment 1.30.2 survival 3.7-0 svglite 2.1.3 sys 3.4.2 systemfonts 1.1.0 tcltk 4.3.3 tensor 1.5 testthat 3.2.1.1 textshaping 0.3.7 tibble 3.2.1 tidygraph 1.3.1 tidyr 1.3.1 tidyselect 1.2.1 tidytree 0.4.6 timechange 0.3.0 timeDate 4032.109 tinytex 0.51 tools 4.3.3 TrajectoryUtils 1.8.0 transport 0.14-6 treeio 1.26.0 tweenr 2.0.3 tzdb 0.4.0 urlchecker 1.0.1 usethis 2.2.3 utf8 1.2.4 utils 4.3.3 uwot 0.1.16 vctrs 0.6.5 velocyto.R 0.6 VGAM 1.1-11 viridis 0.6.5 viridisLite 0.4.2 vroom 1.6.5 waldo 0.5.2 webshot 0.5.5 whisker 0.4.1 withr 3.0.0 writexl 1.5.0 xfun 0.46 XML 3.99-0.17 xml2 1.3.6 xopen 1.0.1 xtable 1.8-4 XVector 0.40.0 yaml 2.3.9 yulab.utils 0.1.4 zip 2.3.1 zlibbioc 1.46.0 zoo 1.8-12 Python packages with versions Package Version ------------------------ -------------- absl-py 2.1.0 access 1.1.9 affine 2.4.0 aiohttp 3.9.5 aiosignal 1.3.1 anndata 0.10.8 annotated-types 0.7.0 anyio 4.4.0 arboreto 0.1.6 argcomplete 3.4.0 array_api_compat 1.7.1 arrow 1.3.0 attrs 23.2.0 backoff 2.2.1 beautifulsoup4 4.12.3 blessed 1.20.0 bokeh 3.5.0 boto3 1.34.145 botocore 1.34.145 cell2location 0.1.3 certifi 2024.7.4 charset-normalizer 3.3.2 chex 0.1.7 click 8.1.7 click-plugins 1.1.1 cligj 0.7.2 cloudpickle 3.0.0 commot 0.0.3 contextlib2 21.6.0 contourpy 1.2.1 croniter 1.4.1 cycler 0.12.1 dask 2024.7.0 dask-expr 0.5.3 dateutils 0.6.12 decorator 4.4.2 deepdiff 7.0.1 Deprecated 1.2.14 deprecation 2.1.0 distributed 2024.2.1 dm-tree 0.1.8 dnspython 2.6.1 docrep 0.3.2 editor 1.6.6 email_validator 2.2.0 esda 2.4.3 etils 1.9.2 fastapi 0.111.1 fastapi-cli 0.0.4 filelock 3.15.4 fiona 1.9.6 flax 0.8.5 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.6.1 future 1.0.0 gensim 4.3.3 geopandas 0.13.2 giddy 2.3.5 graphtools 1.5.3 h11 0.14.0 h5py 3.11.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 idna 3.7 igraph 0.11.6 importlib_metadata 8.0.0 importlib_resources 6.4.0 inequality 1.0.0 inquirer 3.3.0 itsdangerous 2.2.0 jax 0.4.30 jaxlib 0.4.30 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 karateclub 1.2.2 kiwisolver 1.4.5 legacy-api-wrap 1.4 leidenalg 0.10.2 Levenshtein 0.25.1 libpysal 4.7.0 lightning 2.0.9.post0 lightning-cloud 0.5.70 lightning-utilities 0.11.5 llvmlite 0.43.0 locket 1.0.0 loompy 3.0.7 lz4 4.3.3 mapclassify 2.6.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.1 mdurl 0.1.2 mgwr 2.2.1 ml_collections 0.1.1 ml-dtypes 0.4.0 momepy 0.6.0 mpmath 1.3.0 msgpack 1.0.8 mudata 0.2.4 multidict 6.0.5 multipledispatch 1.0.0 natsort 8.4.0 nest-asyncio 1.6.0 networkx 3.3 numba 0.60.0 numpy 1.26.4 numpy-groupies 0.11.1 numpyro 0.15.1 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 opencv-python 4.10.0.84 opt-einsum 3.3.0 optax 0.2.1 orbax-checkpoint 0.5.21 ordered-set 4.1.0 packaging 24.1 pandas 2.0.3 partd 1.4.2 patsy 0.5.6 phate 1.0.11 pillow 10.4.0 pip 24.1.2 platformdirs 4.2.2 plotly 5.22.0 pointpats 2.4.0 POT 0.9.4 protobuf 5.27.2 psutil 6.0.0 PuLP 2.9.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pydantic 2.1.1 pydantic_core 2.4.0 Pygments 2.18.0 PyGSP 0.5.1 PyJWT 2.8.0 pynndescent 0.5.13 pyparsing 3.0.9 pyproj 3.6.1 pyro-api 0.1.2 pyro-ppl 1.9.1 pysal 24.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-igraph 0.11.6 python-Levenshtein 0.25.1 python-louvain 0.16 python-multipart 0.0.9 pytorch-lightning 2.3.3 pytz 2024.1 PyYAML 6.0.1 quantecon 0.7.2 rapidfuzz 3.9.4 rasterio 1.3.10 rasterstats 0.19.0 readchar 4.1.0 requests 2.32.3 rich 13.7.1 Rtree 1.3.0 runs 1.2.2 s_gd2 1.8.1 s3transfer 0.10.2 scanpy 1.10.2 scikit-learn 1.5.1 scipy 1.13.1 scprep 1.2.3 scvelo 0.3.2 scvi-tools 1.1.5 seaborn 0.13.2 segregation 2.5 session_info 1.0.0 setuptools 71.0.1 shapely 2.0.5 shellingham 1.5.4 simplejson 3.19.2 six 1.16.0 smart-open 7.0.4 sniffio 1.3.1 snuggs 1.4.7 sortedcontainers 2.4.0 soupsieve 2.5 spaghetti 1.7.4 sparse 0.15.4 spglm 1.0.8 spint 1.0.7 splot 1.1.5.post1 spopt 0.5.0 spreg 1.4 spvcm 0.3.0 starlette 0.37.2 starsessions 1.3.0 statsmodels 0.14.1 stdlib-list 0.10.0 sympy 1.13.1 tasklogger 1.2.0 tblib 3.0.0 tenacity 8.5.0 tensorstore 0.1.63 texttable 1.7.0 threadpoolctl 3.5.0 tobler 0.11.2 toml 0.10.2 tomlkit 0.13.0 toolz 0.12.1 torch 2.3.1 torchmetrics 1.4.0.post0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 triton 2.3.1 typer 0.12.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 umap-learn 0.5.6 urllib3 2.2.2 uvicorn 0.30.1 uvloop 0.19.0 watchfiles 0.22.0 wcwidth 0.2.13 websocket-client 1.8.0 websockets 12.0 wheel 0.43.0 wrapt 1.16.0 xarray 2024.6.0 xmltodict 0.13.0 xmod 1.8.1 xyzservices 2024.6.0 yarl 1.9.4 yq 3.4.3 zict 3.0.0 zipp 3.19.2 "],["integrated-scrna-seq-pipeline.html", "3 Integrated scRNA-seq pipeline", " 3 Integrated scRNA-seq pipeline Load the R packages. # sc libraries library(Seurat) library(phateR) library(DoubletFinder) library(monocle) library(slingshot) library(URD) library(GSVA) library(limma) library(plyr) library(dplyr) library(org.Mm.eg.db) library(org.Hs.eg.db) library(CellChat) library(velocyto.R) library(SeuratWrappers) library(stringr) library(scran) library(ggpubr) library(viridis) library(pheatmap) library(parallel) library(reticulate) library(SCENIC) library(feather) library(AUCell) library(RcisTarget) library(Matrix) library(foreach) library(doParallel) library(clusterProfiler) library(OpenXGR) # st libraries library(RColorBrewer) library(Rfast2) library(SeuratDisk) library(abcCellmap) library(biomaRt) library(copykat) library(gelnet) library(ggplot2) library(parallelDist) library(patchwork) library(markdown) # getpot library(getopt) library(tools) # HemaScopeR library(HemaScopeR) Run the integrated scRNA-seq pipeline. scRNASeq_10x_pipeline( # input and output input.data.dirs = c('./SRR7881399/outs/filtered_feature_bc_matrix', './SRR7881400/outs/filtered_feature_bc_matrix', './SRR7881401/outs/filtered_feature_bc_matrix', './SRR7881402/outs/filtered_feature_bc_matrix', './SRR7881403/outs/filtered_feature_bc_matrix', './SRR7881404/outs/filtered_feature_bc_matrix', './SRR7881405/outs/filtered_feature_bc_matrix', './SRR7881406/outs/filtered_feature_bc_matrix', './SRR7881407/outs/filtered_feature_bc_matrix', './SRR7881408/outs/filtered_feature_bc_matrix', './SRR7881409/outs/filtered_feature_bc_matrix', './SRR7881410/outs/filtered_feature_bc_matrix', './SRR7881411/outs/filtered_feature_bc_matrix', './SRR7881412/outs/filtered_feature_bc_matrix', './SRR7881413/outs/filtered_feature_bc_matrix', './SRR7881414/outs/filtered_feature_bc_matrix', './SRR7881415/outs/filtered_feature_bc_matrix', './SRR7881416/outs/filtered_feature_bc_matrix', './SRR7881417/outs/filtered_feature_bc_matrix', './SRR7881418/outs/filtered_feature_bc_matrix', './SRR7881419/outs/filtered_feature_bc_matrix', './SRR7881420/outs/filtered_feature_bc_matrix', './SRR7881421/outs/filtered_feature_bc_matrix', './SRR7881422/outs/filtered_feature_bc_matrix', './SRR7881423/outs/filtered_feature_bc_matrix'), project.names = c( 'SRR7881399', 'SRR7881400', 'SRR7881401', 'SRR7881402', 'SRR7881403', 'SRR7881404', 'SRR7881405', 'SRR7881406', 'SRR7881407', 'SRR7881408', 'SRR7881409', 'SRR7881410', 'SRR7881411', 'SRR7881412', 'SRR7881413', 'SRR7881414', 'SRR7881415', 'SRR7881416', 'SRR7881417', 'SRR7881418', 'SRR7881419', 'SRR7881420', 'SRR7881421', 'SRR7881422', 'SRR7881423'), output.dir = './output/', pythonPath = '/home/anaconda3/envs/HemaScopeR/bin/python', # quality control and preprocessing gene.column = 2, min.cells = 10, min.feature = 200, mt.pattern = '^MT-', nFeature_RNA.limit = 200, percent.mt.limit = 20, scale.factor = 10000, nfeatures = 3000, ndims = 50, vars.to.regress = NULL, PCs = 1:35, resolution = 0.4, n.neighbors = 50, # remove doublets doublet.percentage = 0.04, doublerFinderwraper.PCs = 1:20, doublerFinderwraper.pN = 0.25, doublerFinderwraper.pK = 0.1, # phateR phate.knn = 50, phate.npca = 20, phate.t = 10, phate.ndim = 2, min.pct = 0.25, logfc.threshold = 0.25, # visualization ViolinPlot.cellTypeOrders = as.character(1:22), ViolinPlot.cellTypeColors = NULL, Org = 'hsa', loom.files.path = c( './SRR7881399/velocyto/SRR7881399.loom', './SRR7881400/velocyto/SRR7881400.loom', './SRR7881401/velocyto/SRR7881401.loom', './SRR7881402/velocyto/SRR7881402.loom', './SRR7881403/velocyto/SRR7881403.loom', './SRR7881404/velocyto/SRR7881404.loom', './SRR7881405/velocyto/SRR7881405.loom', './SRR7881406/velocyto/SRR7881406.loom', './SRR7881407/velocyto/SRR7881407.loom', './SRR7881408/velocyto/SRR7881408.loom', './SRR7881409/velocyto/SRR7881409.loom', './SRR7881410/velocyto/SRR7881410.loom', './SRR7881411/velocyto/SRR7881411.loom', './SRR7881412/velocyto/SRR7881412.loom', './SRR7881413/velocyto/SRR7881413.loom', './SRR7881414/velocyto/SRR7881414.loom', './SRR7881415/velocyto/SRR7881415.loom', './SRR7881416/velocyto/SRR7881416.loom', './SRR7881417/velocyto/SRR7881417.loom', './SRR7881418/velocyto/SRR7881418.loom', './SRR7881419/velocyto/SRR7881419.loom', './SRR7881420/velocyto/SRR7881420.loom', './SRR7881421/velocyto/SRR7881421.loom', './SRR7881422/velocyto/SRR7881422.loom', './SRR7881423/velocyto/SRR7881423.loom'), # cell cycle cellcycleCutoff = NULL, # cell chat sorting = FALSE, ncores = 10, # Verbose = FALSE, # activeEachStep Whether_load_previous_results = FALSE, Step1_Input_Data = TRUE, Step1_Input_Data.type = 'cellranger-count', Step2_Quality_Control = TRUE, Step2_Quality_Control.RemoveBatches = TRUE, Step2_Quality_Control.RemoveDoublets = TRUE, Step3_Clustering = TRUE, Step4_Identify_Cell_Types = TRUE, Step4_Use_Which_Labels = 'clustering', Step4_Cluster_Labels = NULL, Step4_Changed_Labels = NULL, Step4_run_sc_CNV = TRUE, Step5_Visualization = TRUE, Step6_Find_DEGs = TRUE, Step7_Assign_Cell_Cycle = TRUE, Step8_Calculate_Heterogeneity = TRUE, Step9_Violin_Plot_for_Marker_Genes = TRUE, Step10_Calculate_Lineage_Scores = TRUE, Step11_GSVA = TRUE, Step11_GSVA.identify.cellType.features=TRUE, Step11_GSVA.identify.diff.features=FALSE, Step11_GSVA.comparison.design=NULL, Step12_Construct_Trajectories = TRUE, Step12_Construct_Trajectories.clusters = c('3','6','9','10','11','14','15','19'), Step12_Construct_Trajectories.monocle = TRUE, Step12_Construct_Trajectories.slingshot = TRUE, Step12_Construct_Trajectories.scVelo = TRUE, Step13_TF_Analysis = TRUE, Step14_Cell_Cell_Interaction = TRUE, Step15_Generate_the_Report = TRUE ) "],["step-by-step-scrna-seq-pipeline.html", "4 Step-by-step scRNA-seq Pipeline 4.1 Step 1. Load the R packages and the input data 4.2 Step 2. Quality Control 4.3 Step 3. Clustering 4.4 Step 4. Identify Cell Types 4.5 Step 5. Visualization 4.6 Step 6. Find DEGs 4.7 Step 7. Assign Cell Cycles 4.8 Step 8. Calculate Heterogeneity 4.9 Step 9. Violin Plot for Marker Genes 4.10 Step 10. Calculate Lineage Scores 4.11 Step 11. GSVA 4.12 Step 12. Construct Trajectories 4.13 Step 13. TF Analysis 4.14 Step 14. Cell-Cell Interaction", " 4 Step-by-step scRNA-seq Pipeline 4.1 Step 1. Load the R packages and the input data Load the R packages. # sc libraries library(Seurat) library(phateR) library(DoubletFinder) library(monocle) library(slingshot) library(URD) library(GSVA) library(limma) library(plyr) library(dplyr) library(org.Mm.eg.db) library(org.Hs.eg.db) library(CellChat) library(velocyto.R) library(SeuratWrappers) library(stringr) library(scran) library(ggpubr) library(viridis) library(pheatmap) library(parallel) library(reticulate) library(SCENIC) library(feather) library(AUCell) library(RcisTarget) library(Matrix) library(foreach) library(doParallel) library(clusterProfiler) library(OpenXGR) # st libraries library(RColorBrewer) library(Rfast2) library(SeuratDisk) library(abcCellmap) library(biomaRt) library(copykat) library(gelnet) library(ggplot2) library(parallelDist) library(patchwork) library(markdown) # getpot library(getopt) library(tools) # HemaScopeR library(HemaScopeR) Set the paths for the input data, the output results, and the Python installation. input.data.dirs = c('./SRR7881399/outs/filtered_feature_bc_matrix', './SRR7881400/outs/filtered_feature_bc_matrix', './SRR7881401/outs/filtered_feature_bc_matrix', './SRR7881402/outs/filtered_feature_bc_matrix', './SRR7881403/outs/filtered_feature_bc_matrix', './SRR7881404/outs/filtered_feature_bc_matrix', './SRR7881405/outs/filtered_feature_bc_matrix', './SRR7881406/outs/filtered_feature_bc_matrix', './SRR7881407/outs/filtered_feature_bc_matrix', './SRR7881408/outs/filtered_feature_bc_matrix', './SRR7881409/outs/filtered_feature_bc_matrix', './SRR7881410/outs/filtered_feature_bc_matrix', './SRR7881411/outs/filtered_feature_bc_matrix', './SRR7881412/outs/filtered_feature_bc_matrix', './SRR7881413/outs/filtered_feature_bc_matrix', './SRR7881414/outs/filtered_feature_bc_matrix', './SRR7881415/outs/filtered_feature_bc_matrix', './SRR7881416/outs/filtered_feature_bc_matrix', './SRR7881417/outs/filtered_feature_bc_matrix', './SRR7881418/outs/filtered_feature_bc_matrix', './SRR7881419/outs/filtered_feature_bc_matrix', './SRR7881420/outs/filtered_feature_bc_matrix', './SRR7881421/outs/filtered_feature_bc_matrix', './SRR7881422/outs/filtered_feature_bc_matrix', './SRR7881423/outs/filtered_feature_bc_matrix') output.dir = './output/' pythonPath = '/home/anaconda3/envs/HemaScopeR/bin/python' Set the parameters for loading the data sets. project.names = c('SRR7881399', 'SRR7881400', 'SRR7881401', 'SRR7881402', 'SRR7881403', 'SRR7881404', 'SRR7881405', 'SRR7881406', 'SRR7881407', 'SRR7881408', 'SRR7881409', 'SRR7881410', 'SRR7881411', 'SRR7881412', 'SRR7881413', 'SRR7881414', 'SRR7881415', 'SRR7881416', 'SRR7881417', 'SRR7881418', 'SRR7881419', 'SRR7881420', 'SRR7881421', 'SRR7881422', 'SRR7881423') gene.column = 2 min.cells = 10 min.feature = 200 mt.pattern = '^MT-' Step1_Input_Data.type = 'cellranger-count' Create folders for saving the results of HemaScopeR analysis. wdir <- getwd() if(is.null(pythonPath)==FALSE){ reticulate::use_python(pythonPath) }else{print('Please set the path of Python.')} if (!file.exists(paste0(output.dir, '/HemaScopeR_results/'))) { dir.create(paste0(output.dir, '/HemaScopeR_results/')) } output.dir <- paste0(output.dir,'/HemaScopeR_results/') if (!file.exists(paste0(output.dir, '/RDSfiles/'))) { dir.create(paste0(output.dir, '/RDSfiles/')) } previous_results_path <- paste0(output.dir, '/RDSfiles/') # if (Whether_load_previous_results) { # print('Loading the previous results...') # Load_previous_results(previous_results_path = previous_results_path) # } # Step1. Input data----------------------------------------------------------------------------- print('Step1. Input data.') if (!file.exists(paste0(output.dir, '/Step1.Input_data/'))) { dir.create(paste0(output.dir, '/Step1.Input_data/')) } Load the data sets. file.copy(from = input.data.dirs, to = paste0(output.dir,'/Step1.Input_data/'), recursive = TRUE) if(Step1_Input_Data.type == 'cellranger-count'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_data.temp <- Read10X(data.dir = input.data.dirs[i], gene.column = gene.column) sc_object.temp <- CreateSeuratObject(counts = sc_data.temp, project = project.names[i], min.cells = min.cells, min.feature = min.feature) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp)} }else{ sc_data <- Read10X(data.dir = input.data.dirs, gene.column = gene.column) sc_object <- CreateSeuratObject(counts = sc_data, project = project.names, min.cells = min.cells, min.feature = min.feature) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else if(Step1_Input_Data.type == 'Seurat'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_object.temp <- readRDS(input.data.dirs[i]) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp) } }else{ sc_object <- readRDS(input.data.dirs) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else if(Step1_Input_Data.type == 'Matrix'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_data.temp <- readRDS(input.data.dirs[i]) sc_object.temp <- CreateSeuratObject(counts = sc_data.temp, project = project.names[i], min.cells = min.cells, min.feature = min.feature) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp)} }else{ sc_data <- readRDS(input.data.dirs) sc_object <- CreateSeuratObject(counts = sc_data, project = project.names, min.cells = min.cells, min.feature = min.feature) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else{ stop('Please input data generated by the cellranger-count software, or a Seurat object, or a gene expression matrix. HemaScopeR does not support other formats of input data.') } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.2 Step 2. Quality Control Set the parameters for quality control. # quality control and preprocessing nFeature_RNA.limit = 200 percent.mt.limit = 20 scale.factor = 10000 nfeatures = 3000 ndims = 50 vars.to.regress = NULL PCs = 1:35 resolution = 0.4 n.neighbors = 50 # remove doublets doublet.percentage = 0.04 doublerFinderwraper.PCs = 1:20 doublerFinderwraper.pN = 0.25 doublerFinderwraper.pK = 0.1 Step2_Quality_Control.RemoveBatches = TRUE Step2_Quality_Control.RemoveDoublets = TRUE Create a folder for saving the results of quality control. print('Step2. Quality control.') if (!file.exists(paste0(output.dir, '/Step2.Quality_control/'))) { dir.create(paste0(output.dir, '/Step2.Quality_control/')) } Run the quality control process. if(length(input.data.dirs) > 1){ # preprocess and quality control for multiple scRNA-Seq data sets sc_object <- QC_multiple_scRNASeq(seuratObjects = input.data.list, datasetID = project.names, output.dir = paste0(output.dir,'/Step2.Quality_control/'), Step2_Quality_Control.RemoveBatches = Step2_Quality_Control.RemoveBatches, Step2_Quality_Control.RemoveDoublets = Step2_Quality_Control.RemoveDoublets, nFeature_RNA.limit = nFeature_RNA.limit, percent.mt.limit = percent.mt.limit, scale.factor = scale.factor, nfeatures = nfeatures, ndims = ndims, vars.to.regress = vars.to.regress, PCs = PCs, resolution = resolution, n.neighbors = n.neighbors, percentage = doublet.percentage, doublerFinderwraper.PCs = doublerFinderwraper.PCs, doublerFinderwraper.pN = doublerFinderwraper.pN, doublerFinderwraper.pK = doublerFinderwraper.pK ) }else{ # preprocess and quality control for single scRNA-Seq data set sc_object <- QC_single_scRNASeq(sc_object = sc_object, datasetID = project.names, output.dir = paste0(output.dir,'/Step2.Quality_control/'), Step2_Quality_Control.RemoveDoublets = Step2_Quality_Control.RemoveDoublets, nFeature_RNA.limit = nFeature_RNA.limit, percent.mt.limit = percent.mt.limit, scale.factor = scale.factor, nfeatures = nfeatures, vars.to.regress = vars.to.regress, ndims = ndims, PCs = PCs, resolution = resolution, n.neighbors = n.neighbors, percentage = doublet.percentage, doublerFinderwraper.PCs = doublerFinderwraper.PCs, doublerFinderwraper.pN = doublerFinderwraper.pN, doublerFinderwraper.pK = doublerFinderwraper.pK) } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.3 Step 3. Clustering Set the parameters for clustering. PCs = 1:35 resolution = 0.4 n.neighbors = 50 Create a folder for saving the results of Louvain clustering. print('Step3. Clustering.') if (!file.exists(paste0(output.dir, '/Step3.Clustering/'))) { dir.create(paste0(output.dir, '/Step3.Clustering/')) } Run Louvian clustering. if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){graph.name <- 'integrated_snn'}else{graph.name <- 'RNA_snn'} sc_object <- FindNeighbors(sc_object, dims = PCs, k.param = n.neighbors, force.recalc = TRUE) sc_object <- FindClusters(sc_object, resolution = resolution, graph.name = graph.name) sc_object@meta.data$seurat_clusters <- as.character(as.numeric(sc_object@meta.data$seurat_clusters)) # plot clustering pdf(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','tsne_cluster.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "tsne", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() pdf(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','umap_cluster.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','tsne_cluster.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "tsne", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','umap_cluster.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "umap", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.4 Step 4. Identify Cell Types Set the path for the database. databasePath = "~/HemaScopeR/database/" Set the parameters for cell type identification. Step4_Use_Which_Labels = 'clustering' Step4_Cluster_Labels = NULL Step4_Changed_Labels = NULL Org = 'hsa' ncores = 10 Create a folder for saving the results of cell type identification. print('Step4. Identify cell types automatically.') if (!file.exists(paste0(output.dir, '/Step4.Identify_Cell_Types/'))) { dir.create(paste0(output.dir, '/Step4.Identify_Cell_Types/')) } Run the cell type identification process and the copy number variation (CNV) analysis. sc_object <- run_cell_annotation(object = sc_object, assay = 'RNA', species = Org, output.dir = paste0(output.dir,'/Step4.Identify_Cell_Types/')) if(Org == 'hsa'){ load(paste0(databasePath,"/HematoMap.reference.rdata")) if(length(intersect(rownames(HematoMap.reference), rownames(sc_object))) < 1000){ HematoMap.reference <- RenameGenesSeurat(obj = HematoMap.reference, newnames = toupper(rownames(HematoMap.reference)), gene.use = rownames(HematoMap.reference), de.assay = "RNA", lassays = "RNA") } if(sc_object@active.assay == 'integrated'){ DefaultAssay(sc_object) <- 'RNA' sc_object <- mapDataToRef(ref_object = HematoMap.reference, ref_labels = HematoMap.reference@meta.data$CellType, query_object = sc_object, PCs = PCs, output.dir = paste0(output.dir, '/Step4.Identify_Cell_Types/')) DefaultAssay(sc_object) <- 'integrated' }else{ sc_object <- mapDataToRef(ref_object = HematoMap.reference, ref_labels = HematoMap.reference@meta.data$CellType, query_object = sc_object, PCs = PCs, output.dir = paste0(output.dir, '/Step4.Identify_Cell_Types/')) } } Set the cell labels. # set the cell labels if(Step4_Use_Which_Labels == 'clustering'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$seurat_clusters Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.1'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$Seurat.RNACluster Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.2'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$scmap.RNACluster Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.3'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$Seurat.Immunophenotype Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.4'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$scmap.Immunophenotype Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'HematoMap'){ if(Org == 'hsa'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$predicted.id Idents(sc_object) <- sc_object@meta.data$selectLabels }else{print("'HematoMap' is only applicable to human data ('Org' = 'hsa').")} }else if(Step4_Use_Which_Labels == 'changeLabels'){ if (!is.null(Step4_Cluster_Labels) && !is.null(Step4_Changed_Labels) && length(Step4_Cluster_Labels) == length(Step4_Changed_Labels)){ sc_object@meta.data$selectLabels <- plyr::mapvalues(sc_object@meta.data$seurat_clusters, from = as.character(Step4_Cluster_Labels), to = as.character(Step4_Changed_Labels), warn_missing = FALSE) Idents(sc_object) <- sc_object@meta.data$selectLabels }else{ print("Please input the 'Step4_Cluster_Labels' parameter as Seurat clustering labels, and the 'Step4_Changed_Labels' parameter as new labels. Please note that these two parameters should be of equal length.") } }else{ print('Please set the "Step4_Use_Which_Labels" parameter as "clustering", "abcCellmap.1", "abcCellmap.2", "HematoMap" or "changeLabels".') } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } Run the CNV analysis. sc_CNV(sc_object=sc_object, save_path=paste0(output.dir,'/Step4.Identify_Cell_Types/'), assay = 'RNA', LOW.DR = 0.05, UP.DR = 0.1, win.size = 25, distance = "euclidean", genome = NULL, n.cores = ncores, species = Org) 4.5 Step 5. Visualization Create a folder for saving the visualization results. print('Step5. Visualization.') if (!file.exists(paste0(output.dir, '/Step5.Visualization/'))) { dir.create(paste0(output.dir, '/Step5.Visualization/')) } The statistical results for the numbers and proportions of cell groups. # statistical results cells_labels <- as.data.frame(cbind(rownames(sc_object@meta.data), as.character(sc_object@meta.data$selectLabels))) colnames(cells_labels) <- c('cell_id', 'cluster_id') cluster_counts <- cells_labels %>% group_by(cluster_id) %>% summarise(count = n()) total_cells <- nrow(cells_labels) cluster_counts <- cluster_counts %>% mutate(proportion = count / total_cells) cluster_counts <- as.data.frame(cluster_counts) cluster_counts$percentages <- scales::percent(cluster_counts$proportion, accuracy = 0.1) cluster_counts <- cluster_counts[,-which(colnames(cluster_counts)=='proportion')] cluster_counts$cluster_id_count_percentages <- paste(cluster_counts$cluster_id, " (", cluster_counts$count, ' cells; ', cluster_counts$percentages, ")", sep='') cluster_counts <- cluster_counts[order(cluster_counts$count, decreasing = TRUE),] cluster_counts <- rbind(cluster_counts, c('Total', sum(cluster_counts$count), '100%', 'all cells')) sc_object@meta.data$cluster_id_count_percentages <- mapvalues(sc_object@meta.data$selectLabels, from=cluster_counts$cluster_id, to=cluster_counts$cluster_id_count_percentages, warn_missing=FALSE) colnames(sc_object@meta.data)[which(colnames(sc_object@meta.data) == 'cluster_id_count_percentages')] <- paste('Total ', nrow(sc_object@meta.data), ' cells', sep='') cluster_counts <- cluster_counts[,-which(colnames(cluster_counts)=='cluster_id_count_percentages')] colnames(cluster_counts) <- c('Cell types', 'Cell counts', 'Percentages') # names(colorvector) <- mapvalues(names(colorvector), # from=cluster_counts$cluster_id, # to=cluster_counts$cluster_id_count_percentages, # warn_missing=FALSE) write.csv(cluster_counts, file=paste(paste0(output.dir, '/Step5.Visualization/'), '/cell types_cell counts_percentages.csv', sep=''), quote=FALSE, row.names=FALSE) The UMAP visualization. pdf(paste(paste0(output.dir, '/Step5.Visualization/'), '/cell types_cell counts_percentages_umap.pdf', sep=''), width = 14, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = paste('Total ', nrow(sc_object@meta.data), ' cells', sep=''), label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Set the parameters for phateR. phate.knn = 50 phate.npca = 20 phate.t = 10 phate.ndim = 2 Run phateR for dimensional reduction and visualization. # run phateR if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object) <- 'integrated' }else{ DefaultAssay(sc_object) <- 'RNA'} if(!is.null(pythonPath)){ run_phateR(sc_object = sc_object, output.dir = paste0(output.dir,'/Step5.Visualization/'), pythonPath = pythonPath, phate.knn = phate.knn, phate.npca = phate.npca, phate.t = phate.t, phate.ndim = phate.ndim) } Perform visualization using UMAP and TSNE. # plot cell types pdf(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','tsne cell types.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "tsne", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() pdf(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','umap cell types.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','tsne cell types.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "tsne", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','umap cell types.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "umap", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.6 Step 6. Find DEGs Set the parameters for identifying differentially expressed genes. min.pct = 0.25 logfc.threshold = 0.25 Create a folder for the DEGs analysis. print('Step6. Find DEGs.') if (!file.exists(paste0(output.dir, '/Step6.Find_DEGs/'))) { dir.create(paste0(output.dir, '/Step6.Find_DEGs/')) } Identify DEGs using Wilcoxon Rank-Sum Test. sc_object.markers <- FindAllMarkers(sc_object, only.pos = TRUE, min.pct = min.pct, logfc.threshold = logfc.threshold) write.csv(sc_object.markers, file = paste0(paste0(output.dir, '/Step6.Find_DEGs/'),'sc_object.markerGenes.csv'), quote=FALSE) Set the parameters for GPTCelltype. your_openai_API_key = '' tissuename = 'human bone marrow' gptmodel = 'gpt-3.5' Use GPTCelltype to assist cell type annotation. GPT_annotation( marker.genes = sc_object.markers, your_openai_API_key = your_openai_API_key, tissuename = tissuename, gptmodel = gptmodel, output.dir = paste0(output.dir, '/Step6.Find_DEGs/')) Perform GO and KEGG enrichment. # GO enrichment if(Org=='mmu'){ OrgDb <- 'org.Mm.eg.db' }else if(Org=='hsa'){ OrgDb <- 'org.Hs.eg.db' }else{ stop("Org should be 'mmu' or 'hsa'.") } HemaScopeREnrichment(DEGs=sc_object.markers, OrgDb=OrgDb, output.dir=paste0(output.dir, '/Step6.Find_DEGs/')) sc_object.markers.top5 <- sc_object.markers %>% group_by(cluster) %>% top_n(n = 5, wt = avg_log2FC) pdf(paste0(paste0(output.dir, '/Step6.Find_DEGs/'), 'sc_object_markerGenesTop5.pdf'), width = 0.5*length(unique(sc_object.markers.top5$gene)), height = 0.5*length(unique(Idents(sc_object)))) print(DotPlot(sc_object, features = unique(sc_object.markers.top5$gene), cols=c("lightgrey",'red'))+theme(axis.text.x =element_text(angle = 45, vjust = 1, hjust = 1))) dev.off() png(paste0(paste0(output.dir, '/Step6.Find_DEGs/'), 'sc_object_markerGenesTop5.png'), width = 20*length(unique(sc_object.markers.top5$gene)), height = 30*length(unique(Idents(sc_object)))) print(DotPlot(sc_object, features = unique(sc_object.markers.top5$gene), cols=c("lightgrey",'red'))+theme(axis.text.x =element_text(angle = 45, vjust = 1, hjust = 1))) dev.off() Create a folder for saving the results of gene network analysis. if (!file.exists(paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/'))) { dir.create(paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/')) } Perform gene network analysis. OpenXGR_SAG(sc_object.markers = sc_object.markers, output.dir = paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/'), subnet.size = 10) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.7 Step 7. Assign Cell Cycles Create a folder for saving the results of cell cycle analysis. print('Step7. Assign cell cycles.') if (!file.exists(paste0(output.dir, '/Step7.Assign_cell_cycles/'))) { dir.create(paste0(output.dir, '/Step7.Assign_cell_cycles/')) } Set the parameters for the cell cycle analysis. cellcycleCutoff = NULL Run the cell cycle analysis. datasets.before.batch.removal <- readRDS(paste0(paste0(output.dir, '/RDSfiles/'),'datasets.before.batch.removal.rds')) sc_object <- cellCycle(sc_object=sc_object, counts_matrix = GetAssayData(object = datasets.before.batch.removal, slot = "counts")%>%as.matrix(), data_matrix = GetAssayData(object = datasets.before.batch.removal, slot = "data")%>%as.matrix(), cellcycleCutoff = cellcycleCutoff, cellTypeOrders = unique(sc_object@meta.data$selectLabels), output.dir=paste0(output.dir, '/Step7.Assign_cell_cycles/'), databasePath = databasePath, Org = Org) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.8 Step 8. Calculate Heterogeneity Create a folder for saving the results of heterogeneity calculation. print('Step8. Calculate heterogeneity.') if (!file.exists(paste0(output.dir, '/Step8.Calculate_heterogeneity/'))) { dir.create(paste0(output.dir, '/Step8.Calculate_heterogeneity/')) } Run heterogeneity calculation process. expression_matrix <- GetAssayData(object = datasets.before.batch.removal, slot = "data")%>%as.matrix() expression_matrix <- expression_matrix[,rownames(sc_object@meta.data)] cell_types_groups <- as.data.frame(cbind(sc_object@meta.data$selectLabels, sc_object@meta.data$datasetID)) colnames(cell_types_groups) <- c('clusters', 'datasetID') if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } heterogeneity(expression_matrix = expression_matrix, cell_types_groups = cell_types_groups, cellTypeOrders = cellTypes_orders, output.dir = paste0(output.dir, '/Step8.Calculate_heterogeneity/')) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.9 Step 9. Violin Plot for Marker Genes Create a folder for saving the violin plots of marker genes. print('Step9. Violin plot for marker genes.') if (!file.exists(paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/'))) { dir.create(paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/')) } Run violin plot visualization. if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object) <- 'integrated' }else{ DefaultAssay(sc_object) <- 'RNA'} dataMatrix <- GetAssayData(object = sc_object, slot = "scale.data") if(is.null(marker.genes)&(Org == 'mmu')){ # mpp genes are from 'The bone marrow microenvironment at single cell resolution' # the other genes are from 'single cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis' # the aliases of these genes were changed in gecodeM16:Gpr64 -> Adgrg2, Sdpr -> Cavin2, Hbb-b1 -> Hbb-bs, Sfpi1 -> Spi1 HSC_lineage_signatures <- c('Slamf1', 'Itga2b', 'Kit', 'Ly6a', 'Bmi1', 'Gata2', 'Hlf', 'Meis1', 'Mpl', 'Mcl1', 'Gfi1', 'Gfi1b', 'Hoxb5') Mpp_genes <- c('Mki67', 'Mpo', 'Elane', 'Ctsg', 'Calr') Erythroid_lineage_signatures <- c('Klf1', 'Gata1', 'Mpl', 'Epor', 'Vwf', 'Zfpm1', 'Fhl1', 'Adgrg2', 'Cavin2','Gypa', 'Tfrc', 'Hbb-bs', 'Hbb-y') Lymphoid_lineage_signatures <- c('Tcf3', 'Ikzf1', 'Notch1', 'Flt3', 'Dntt', 'Btg2', 'Tcf7', 'Rag1', 'Ptprc', 'Ly6a', 'Blnk') Myeloid_lineage_signatures <- c('Gfi1', 'Spi1', 'Mpo', 'Csf2rb', 'Csf1r', 'Gfi1b', 'Hk3', 'Csf2ra', 'Csf3r', 'Sp1', 'Fcgr3') marker.genes <- c(HSC_lineage_signatures, Mpp_genes, Erythroid_lineage_signatures, Lymphoid_lineage_signatures, Myeloid_lineage_signatures) }else if(is.null(marker.genes)&(Org == 'hsa')){ HSPCs_lineage_signatures <- c('CD34','KIT','AVP','FLT3','MME','CD7','CD38','CSF1R','FCGR1A','MPO','ELANE','IL3RA') Myeloids_lineage_signatures <- c('LYZ','CD36','MPO','FCGR1A','CD4','CD14','CD300E','ITGAX','FCGR3A','FLT3','AXL', 'SIGLEC6','CLEC4C','IRF4','LILRA4','IL3RA','IRF8','IRF7','XCR1','CD1C','THBD', 'MRC1','CD34','KIT','ITGA2B','PF4','CD9','ENG','KLF','TFRC') B_cells_lineage_signatures <- c('CD79A','IGLL1','RAG1','RAG2','VPREB1','MME','IL7R','DNTT','MKI67','PCNA','TCL1A','MS4A1','IGHD','CD27','IGHG3') T_NK_cells_lineage_signatures <- c('CD3D','CD3E','CD8A','CCR7','IL7R','SELL','KLRG1','CD27','GNLY', 'NKG7','PDCD1','TNFRSF9','LAG3','CD160','CD4','CD40LG','IL2RA', 'FOXP3','DUSP4','IL2RB','KLRF1','FCGR3A','NCAM1','XCL1','MKI67','PCNA','KLRF') marker.genes <- c(HSPCs_lineage_signatures, Myeloids_lineage_signatures, B_cells_lineage_signatures, T_NK_cells_lineage_signatures) } if(is.null(ViolinPlot.cellTypeOrders)){ ViolinPlot.cellTypeOrders <- unique(sc_object@meta.data$selectLabels) } if(is.null(ViolinPlot.cellTypeColors)){ ViolinPlot.cellTypeColors <- viridis::viridis(length(unique(sc_object@meta.data$selectLabels))) } combinedViolinPlot(dataMatrix = dataMatrix, features = marker.genes, CellTypes = sc_object@meta.data$selectLabels, cellTypeOrders = ViolinPlot.cellTypeOrders, cellTypeColors = ViolinPlot.cellTypeColors, Org = Org, output.dir = paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/'), databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.10 Step 10. Calculate Lineage Scores Create a folder for saving the results of lineage score calculation. print('Step10. Calculate lineage scores.') # we use normalized data here if (!file.exists(paste0(output.dir, '/Step10.Calculate_lineage_scores/'))) { dir.create(paste0(output.dir, '/Step10.Calculate_lineage_scores/')) } Run lineage score calculation. if(is.null(lineage.genelist)&is.null(lineage.names)&(Org == 'mmu')){ lineage.genelist <- c(list(HSC_lineage_signatures), list(Mpp_genes), list(Erythroid_lineage_signatures), list(Lymphoid_lineage_signatures), list(Myeloid_lineage_signatures)) lineage.names <- c('HSC_lineage_signatures', 'Mpp_genes', 'Erythroid_lineage_signatures', 'Lymphoid_lineage_signatures', 'Myeloid_lineage_signatures') }else if(is.null(lineage.genelist)&is.null(lineage.names)&(Org == 'hsa')){ lineage.genelist <- c(list(HSPCs_lineage_signatures), list(Myeloids_lineage_signatures), list(B_cells_lineage_signatures), list(T_NK_cells_lineage_signatures)) lineage.names <- c('HSPCs_lineage_signatures', 'Myeloids_lineage_signatures', 'B_cells_lineage_signatures', 'T_NK_cells_lineage_signatures') } if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } lineageScores(expression_matrix = expression_matrix, cellTypes = sc_object@meta.data$selectLabels, cellTypes_orders = cellTypes_orders, cellTypes_colors = ViolinPlot.cellTypeColors, groups = sc_object@meta.data$datasetID, groups_orders = unique(sc_object@meta.data$datasetID), groups_colors = groups_colors, lineage.genelist = lineage.genelist, lineage.names = lineage.names, Org = Org, output.dir = paste0(output.dir, '/Step10.Calculate_lineage_scores/'), databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.11 Step 11. GSVA Create a folder for saving the results of GSVA. print('Step11. GSVA.') if (!file.exists(paste0(output.dir, '/Step11.GSVA/'))) { dir.create(paste0(output.dir, '/Step11.GSVA/')) } Run GSVA. setwd(wdir) if(Org=='mmu'){ load(paste0(databasePath,"/mouse_c2_v5p2.rdata")) GSVA.genelist <- Mm.c2 assign('OrgDB', org.Mm.eg.db) }else if(Org=='hsa'){ load(paste0(databasePath,"/human_c2_v5p2.rdata")) GSVA.genelist <- Hs.c2 assign('OrgDB', org.Hs.eg.db) }else{ stop("Org should be 'mmu' or 'hsa'.") } if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } run_GSVA(sc_object = sc_object, GSVA.genelist = GSVA.genelist, GSVA.cellTypes = sc_object@meta.data$selectLabels, GSVA.cellTypes.orders = cellTypes_orders, GSVA.cellGroups = sc_object@meta.data$datasetID, GSVA.identify.cellType.features = Step11_GSVA.identify.cellType.features, GSVA.identify.diff.features = Step11_GSVA.identify.diff.features, GSVA.comparison.design = Step11_GSVA.comparison.design, OrgDB = OrgDB, output.dir = paste0(output.dir, '/Step11.GSVA/')) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.12 Step 12. Construct Trajectories Load gene symbols and ensemble IDs. DefaultAssay(sc_object) <- 'RNA' countsSlot <- GetAssayData(object = sc_object, slot = "counts") gene_metadata <- as.data.frame(rownames(countsSlot)) rownames(gene_metadata) <- gene_metadata[,1] if(Org == 'mmu'){ load(paste0(databasePath,"/mouseGeneSymbolandEnsembleID.rdata")) gene_metadata $ ensembleID <- mapvalues(x = gene_metadata[,1], from = mouseGeneSymbolandEnsembleID$geneName, to = mouseGeneSymbolandEnsembleID$ensemblIDNoDot, warn_missing = FALSE) }else if(Org == 'hsa'){ load(paste0(databasePath,"/humanGeneSymbolandEnsembleID.rdata")) gene_metadata $ ensembleID <- mapvalues(x = gene_metadata[,1], from = humanGeneSymbolandEnsembleID$geneName, to = humanGeneSymbolandEnsembleID$ensemblIDNoDot, warn_missing = FALSE) } colnames(gene_metadata) <- c('gene_short_name','ensembleID') Create folders for saving the results of trajectory construction. print('Step12. Construct trajectories.') if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/monocle2/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/monocle2/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/slingshot/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/slingshot/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/scVelo/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/scVelo/')) } Prepare the input data. if(is.null(Step12_Construct_Trajectories.clusters)){ sc_object.subset <- sc_object countsSlot.subset <- GetAssayData(object = sc_object.subset, slot = "counts") }else{ sc_object.subset <- subset(sc_object, subset = selectLabels %in% Step12_Construct_Trajectories.clusters) countsSlot.subset <- GetAssayData(object = sc_object.subset, slot = "counts") } Run monocle2. # monocle2 phenoData <- sc_object.subset@meta.data featureData <- gene_metadata run_monocle(cellData = countsSlot.subset, phenoData = phenoData, featureData = featureData, lowerDetectionLimit = 0.5, expressionFamily = VGAM::negbinomial.size(), cellTypes='selectLabels', monocle.orders=Step12_Construct_Trajectories.clusters, monocle.colors = ViolinPlot.cellTypeColors, output.dir = paste0(output.dir, '/Step12.Construct_trajectories/monocle2/')) Run slingshot. # slingshot if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object.subset) <- 'integrated' }else{ DefaultAssay(sc_object.subset) <- 'RNA'} run_slingshot(slingshot.PCAembeddings = Embeddings(sc_object.subset, reduction = "pca")[, PCs], slingshot.cellTypes = sc_object.subset@meta.data$selectLabels, slingshot.start.clus = slingshot.start.clus, slingshot.end.clus = slingshot.end.clus, slingshot.colors = slingshot.colors, output.dir = paste0(output.dir, '/Step12.Construct_trajectories/slingshot/')) Run scVelo. # scVelo if((!is.null(loom.files.path))&(!is.null(pythonPath))){ prepareDataForScvelo(sc_object = sc_object.subset, loom.files.path = loom.files.path, scvelo.reduction = 'pca', scvelo.column = 'selectLabels', output.dir = paste0(output.dir, '/Step12.Construct_trajectories/scVelo/')) reticulate::py_run_string(paste0("import os\\noutputDir = '", output.dir, "'")) reticulate::py_run_file(file.path(system.file(package = "HemaScopeR"), "python/sc_run_scvelo.py"), convert = FALSE) } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.13 Step 13. TF Analysis Create folders for saving the results of TF analysis. print('Step13. TF analysis.') if (!file.exists(paste0(output.dir, '/Step13.TF_analysis/'))) { dir.create(paste0(output.dir, '/Step13.TF_analysis/')) } Run SCENIC to perform TF analysis. run_SCENIC(countMatrix = countsSlot, cellTypes = sc_object@meta.data$selectLabels, datasetID = sc_object@meta.data$datasetID, cellTypes_colors = Step13_TF_Analysis.cellTypes_colors, cellTypes_orders = unique(sc_object@meta.data$selectLabels), groups_colors = Step13_TF_Analysis.groups_colors, groups_orders = unique(sc_object@meta.data$datasetID), Org = Org, output.dir = paste0(output.dir, '/Step13.TF_analysis/'), pythonPath = pythonPath, databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.14 Step 14. Cell-Cell Interaction Create folders for saving the results of cell-cell interaction analysis. print('Step14. Cell-cell interaction.') if (!file.exists(paste0(output.dir, '/Step14.Cell_cell_interection/'))) { dir.create(paste0(output.dir, '/Step14.Cell_cell_interection/')) } Run CellChat to perform cell-cell interaction analysis. tempwd <- getwd() run_CellChat(data.input=countsSlot, labels = sc_object@meta.data$selectLabels, cell.orders = ViolinPlot.cellTypeOrders, cell.colors = ViolinPlot.cellTypeColors, sample.names = rownames(sc_object@meta.data), Org = Org, sorting = sorting, output.dir = paste0(output.dir, '/Step14.Cell_cell_interection/')) setwd(tempwd) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } "],["stey-by-step-st-seq-pipeline.html", "5 Stey-by-step st-seq pipeline 5.1 Step 1. Data loading 5.2 Step 2. QC 5.3 Step 3. Clustering 5.4 Step 4. DEGs 5.5 Step 5. Spatially variable features 5.6 Step 6. Spatial interaction 5.7 Step 7. CNV analysis 5.8 Step 8. Deconvolution 5.9 Step 9. Cell cycle 5.10 Step 10. Niche analysis", " 5 Stey-by-step st-seq pipeline 5.1 Step 1. Data loading The st_Loading_Data function is designed for loading 10X Visium spatial transcriptomics data from Space Ranger. It will load data from input.data.dir and output it in the SeuratOjbect format. 5.1.1 Function arguments: input.data.dir: The directory where the input data is stored. output.dir: The directory where the processed output will be saved. If not specified, the output is saved in the current working directory. Default is ‘.’. sampleName: A string naming the sample. Default is ‘Hema_ST’. rds.file: A boolean indicating if the input data is in RDS file format rather than a typical results of Space Ranger. Default is FALSE. filename: The name of the file to be loaded if the data is not in RDS format. Default is “filtered_feature_bc_matrix.h5”. assay: The specific assay to apply to the data. Default is ‘Spatial’. slice: The image slice identifier for the spatial data. Default is ‘slice1’. filter.matrix: A boolean indicating whether to load filtered matrix. Default is TRUE. to.upper: A boolean indicating whether to convert feature names to upper form. Default is FALSE. 5.1.2 Funciton behavior: Directory Creation: The function first checks if the output.dir exists; if not, it creates it. RDS File Handling: If rds.file is TRUE, it reads the RDS file, ensuring the specified assay and slice are present in the Seurat object. Non-RDS File Handling: If rds.file is FALSE, it loads the data using Load10X_Spatial from Seurat. Saving the Object: Uses SaveH5Seurat and Convert to save the Seurat object in rds and h5ad formats. File Copying: Copies any necessary files (filter matrix, spatial image) to the output.dir. Return Value: Returns the processed Seurat object. 5.1.3 An example: st_obj <- st_Loading_Data( input.data.dir = 'path/to/data', output.dir = '.', sampleName = 'Hema_ST, rds.file = FALSE, filename = 'filtered_feature_bc_matrix.h5', assay = 'Spatial', slice = 'slice1', filter.matrix = TRUE, to.upper = FALSE ) 5.1.4 Outputs: Spatial transcriptome data in rds and h5ad formats 5.2 Step 2. QC The QC_Spatial function performs basic quality control on a SeuratObject containing 10X visium data and returns the filtered SeuratObject. It provides options to set thresholds for the number of genes, nUMI (unique molecular identifiers), and spots expressing each gene. It also allows for the removal of mitochondrial genes based on species. 5.2.1 Function arguments: st_obj: A SeuratObject of 10X visium data. output.dir: A character string specifying the path to store the results and figures. Default is the current working directory. min.gene: An integer representing the minimum number of genes detected in a spot. Default is 200. max.gene: An integer representing the maximum number of genes detected in a spot. Default is Inf (no upper limit). min.nUMI: An integer representing the minimum number of nUMI detected in a spot. Default is 500. max.nUMI: An integer representing the maximum number of nUMI detected in a spot. Default is Inf (no upper limit). min.spot: An integer representing the minimum number of spots expressing each gene. Default is 3. species: A character string representing the species of sample, either ‘human’ or ‘mouse’. bool.remove.mito: A boolean value indicating whether to remove mitochondrial genes. Default is TRUE. SpatialColors: A function that interpolates a set of given colors to create new color palettes and color ramps. Default is a color palette with reversed Spectral colors from RColorBrewer. 5.2.2 Function behavior: Plots and saves the spatial distribution of nUMI and nGene. Plots and saves violin plots for nUMI and nGene. Identifies and marks low-quality spots based on nUMI and nGene thresholds. Plots the spatial distribution of quality. Plots and saves a histogram for the number of spots expressing each gene. Plots the spatial distribution of mitochondrial genes. Saves the raw SeuratObject before filtering. Removes low-quality spots and genes with fewer occurrences. Optionally removes mitochondrial genes. Saves the filtered SeuratObject. Returns the filtered st_obj. 5.2.3 An example: st_obj <- QC_Spatial( st_obj = st_obj, output.dir = '.', min.gene = 200, min.nUMI = Inf, max.gene = 500, max.nUMI = Inf, min.spot = 3, species = 'human', bool.remove.mito = TRUE, SpatialColors = colorRampPalette(colors = rev(x = brewer.pal(n = 11, name = "Spectral"))) ) 5.2.4 Outputs: Figures showing the spatial distribution of nUMI and nGene. Violin plots of nUMI and nGene. Figures showing the quality. Histograms for the number of spots expressing each gene. Figures showing the spatial distribution of mitochondrial genes. Raw and filtered SeuratObject. 5.3 Step 3. Clustering The st_Clustering function is designed to perform clustering analysis on spatial transcriptomics data. It integrates several key steps including data normalization, dimensionality reduction, clustering, and visualization. The function saves the results and visualizations to output.dir. 5.3.1 Function arguments: st_obj: The input spatial transcriptomics seurat object that contains the data to be clustered. output.dir: The directory where the output files will be saved. Default is the current directory (‘.’). normalization.method: The method used for data normalization. Default is ‘SCTransform’. npcs: The number of principal components to use in PCA. Default is 50. pcs.used: The principal components to use for clustering. Default is the first 10 PCs (1:10). resolution: The resolution parameter for the clustering algorithm. Default is 0.8. verbose: A logical flag to print progress messages. Default is FALSE. 5.3.2 Function behavior: Data Normalization and PCA: Depending on the normalization.method, the function either uses SCTransform or a standard normalization method followed by scaling and variable feature detection. Performs PCA on the normalized data. Clustering and Dimensionality Reduction: Finds nearest neighbors using the specified principal components (pcs.used). Identifies clusters using the specified resolution. Performs UMAP and t-SNE for visualization of the clusters. Visualization: Generates spatial, UMAP, and t-SNE plots of the clusters with customized color schemes. Saves these plots as images in the specified directory. Saving Results: Saves the updated st_obj as an RDS file. Exports the metadata of st_obj to a CSV file. Return Value: Returns the updated st_obj containing the clustering results. 5.3.3 An example: st_obj <- st_Clustering( st_obj = st_obj, output.dir = '.', normalization.method = 'SCTransform', npcs = 50, pcs.used = 1:10, resolution = 0.8, verbose = FALSE ) 5.3.4 Outputs: Figures showing the results of clustering. SeuratObject in rds format. 5.4 Step 4. DEGs The st_Find_DEGs function is designed to identify differentially expressed genes (DEGs) in spatial transcriptomics data. It performs differential expression analysis based on clustering results, visualizes the top markers, and saves the results to output.dir. 5.4.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for DEG analysis. output.dir: The directory where output files will be saved. Default is the current directory (‘.’). ident.label: The metadata label used for identifying clusters. Default is 'seurat_clusters'. only.pos: A logical flag to include only positive markers. Default is TRUE. min.pct: The minimum fraction of cells expressing the gene in either cluster. Default is 0.25. logfc.threshold: The log fold change threshold for considering a gene differentially expressed. Default is 0.25. test.use: The statistical test to use for differential expression analysis. Default is 'wilcox'. verbose: A logical flag to print progress messages. Default is FALSE. 5.4.2 Function behavior: Set Identifiers: Sets the cluster identifiers in the spatial transcriptomics object (st_obj) based on the specified ident.label. Find Differentially Expressed Genes (DEGs): Performs differential expression analysis using the specified parameters (only.pos, min.pct, logfc.threshold, test.use). Top Marker Genes: Selects the top 5 marker genes for each cluster based on the highest average log fold change. Visualization: Generates a dot plot for the top DEGs and saves the plot as an image in the specified directory. Saving Results: Saves the DEG results as a CSV file. Return Value: Returns the data frame containing the identified DEGs. 5.4.3 An example: st.markers <- st_Find_DEGs( st_obj = st_obj, output.dir = '.', ident.label = 'seurat_clusters', only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, test.use = 'wilcox', verbose = FALSE ) 5.4.4 Outputs: Dot plots showing markers. CSV file containing the information of markers. 5.5 Step 5. Spatially variable features The st_SpatiallyVariableFeatures function identifies and visualizes spatially variable features (SVFs) in spatial transcriptomics data. It integrates the identification of spatially variable features using a specified method, saves the results to a directory, and creates visualizations of the top spatially variable features. 5.5.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for analysis. output.dir: The directory where output files will be saved. Default is the current directory. assay: The assay to be used for finding spatially variable features. Default is 'SCT'. selection.method: The method used for selecting spatially variable features. Default is 'moransi'. n.top.show: The number of top spatially variable features to visualize. Default is 10. n.col: The number of columns for the visualization grid. Default is 5. verbose: A logical flag to print progress messages. Default is FALSE. 5.5.2 Function behavior: Identify Spatially Variable Features: Identifies spatially variable features using the specified method and assay. Suppresses warnings during the process. Save Metadata: Extracts metadata features and saves them as a CSV file in output.dir. Visualization: Selects the top n.top.show spatially variable features. Generates and saves a spatial feature plot of these features in the specified directory. Return Value: Returns the updated st_obj containing the identified spatially variable features. 5.5.3 An example: st_obj <- st_SpatiallyVariableFeatures( st_obj = st_obj, output.dir = '.', assay = st_obj@active.assay, selection.method = 'moransi', n.top.show = 10, n.col = 5, verbose = FALSE ) 5.5.4 Outputs: Figures showing SVFs. CSV file containing the information of SVFs. 5.6 Step 6. Spatial interaction The st_Interaction function is used to identify and visualize interactions between clusters based on spatial transcriptomics data. It utilizes Commot to analyze spatial interactions, identify pathway activities, and assess the strength and significance of interactions. 5.6.1 Function arguments: st_data_path: Path to the spatial transcriptomics data. metadata_path: Path to the metadata associated with the spatial transcriptomics data. library_id: Identifier for the spatial transcriptomics library. Default is 'Hema_ST'. label_key: Key in the metadata to identify cell clusters. Default is 'seurat_clusters'. save_path: The directory where output files will be saved. Default is the current directory. species: The species of the spatial transcriptomics data. Default is 'human'. signaling_type: Type of signaling interactions to consider. Default is 'Secreted Signaling'. database: Database to be used for the analysis. Default is 'CellChat'. min_cell_pct: Minimum percentage of cells to consider for interaction analysis. Default is 0.05. dis_thr: Distance threshold for defining interactions. Default is 500. n_permutations: Number of permutations for assessing significance. Default is 100. pythonPath: The path to the Python environment containing Commot to use for the analysis. Default is ‘.’. 5.6.2 Function behavior: Commot Analysis: Uses Commot to perform interaction analysis, identifying interactions within and between clusters. Visualization: Generates visualizations of pathway interactions and interactions between ligand-receptors (LRs) within and between clusters, and saves them in save_path. 5.6.3 An example: st_Interaction( st_data_path = 'path/to/data', metadata_path = 'path/to/metadata', library_id = 'Hema_ST', label_key = 'seurat_clusters', save_path = '.', species = 'human', signaling_type = 'Secreted Signaling', database = 'CellChat', min_cell_pct = 0.05, dis_thr = 500, n_permutations = 100, pythonPath = 'path/to/python' ) 5.6.4 Outputs: Dot plot showing pathway interaction between and within clusters. Dot plot showing LRs interaction between and within clusters. The information of each LR and pathway. 5.7 Step 7. CNV analysis The st_CNV function identifies and visualizes copy number variations (CNVs) in spatial transcriptomics data. It uses CopyKAT to perform the CNV analysis, saves the results, and generates visual representations of CNV states. 5.7.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for analysis. save_path: The directory where output files will be saved. assay: The assay to be used for CNV analysis. Default is 'Spatial'. LOW.DR: The lower threshold for the dropout rate in CopyKAT. Default is 0.05. UP.DR: The upper threshold for the dropout rate in CopyKAT. Default is 0.1. win.size: The window size for the CNV analysis. Default is 25. distance: The distance metric to be used for the analysis. Default is \"euclidean\". genome: The genome version to be used, ‘hg20’ or ‘mm10’. Default is \"hg20\". n.cores: The number of cores to be used for parallel processing. Default is 1. species: The species of the spatial transcriptomics data. Default is 'human'. 5.7.2 Function behavior: CopyKAT Analysis: Runs CopyKAT pipeline to perform CNV analysis using the provided parameters. Saving Results: Saves the CopyKAT results as an RDS file. Plotting: Generates plots of the CNV states and saves them in save_path. Updating Metadata: Updates the spatial transcriptomics object with CNV state metadata. Return Value: Returns the updated st_obj containing the CNV state information. 5.7.3 An example: st_obj <- st_CNV( st_obj = st_obj, save_path = '.', assay = 'Spatial', LOW.DR = 0.05, UP.DR = 0.1, win.size = 25, distance = "euclidean", genome = 'hg20', n.cores = 1, species = 'human' ) 5.7.4 Outputs: Figures showing the predicted CNV states. Figures showing the CNV heatmap. rds files of results of copykat. 5.8 Step 8. Deconvolution The st_Deconvolution function aims to perform spatial deconvolution analysis on spatial transcriptomics data to estimate the cell-type composition and abundance in different regions. The function utilizes cell2location to infer cell-type abundance and spatial distributions, allowing for the visualization and interpretation of spatially resolved cell populations within the tissue. 5.8.1 Function arguments: st.data.dir: Path to the spatial transcriptomics data. sc.h5ad.dir: Path to the single-cell RNA-seq data in h5ad format. Default is NULL. library_id: Identifier for the spatial transcriptomics library. Default is 'Hema_ST'. st_obj: Spatial transcriptomics object containing the data for analysis. Default is NULL. save_path: The directory where output files will be saved. Default is NULL. sc.labels.key: Key in the single-cell metadata to identify cell clusters. Default is 'seurat_clusters'. species: The species of the spatial transcriptomics data. Default is 'mouse'. sc.max.epoch: Maximum number of epochs used for single-cell deconvolution. Default is 1000. st.max.epoch: Maximum number of epochs used for spatial deconvolution. Default is 10000. use.gpu: Logical value indicating whether to use GPU for computation. Default is FALSE. use.Dataset: The dataset to be used for analysis, such as 'HematoMap' or 'LymphNode'. pythonPath: The path to the Python environment containing cell2location to use for the analysis. Default is ‘.’. 5.8.2 Function behavior: Deconvolution Analysis: Performs the spatial deconvolution analysis using the provided spatial transcriptomics and single-cell RNA-seq data. Post-Analysis Processing: Processes the deconvolution results and visualizes the spatial distribution of inferred cell types within the tissue. Returning Results: If a Seurat object is provided, the updated Seurat object with cell type information is returned. 5.8.3 An example: st_obj <- st_Deconvolution( st.data.dir = 'path/to/data', library_id = 'Hema_ST', sc.h5ad.dir = NULL, st_obj = st_obj, save_path = '.', sc.labels.key = 'seurat_clusters', species = 'human', sc.max.epoch = 1000, st.max.epoch = 10000, use.gpu = FALSE, use.Dataset = 'LymphNode', pythonPath = 'path/to/python' ) 5.8.4 Outputs: Figures showing the predicted abundance of each cell-type. The parameters of trained cell2location model. 5.9 Step 9. Cell cycle The st_Cell_cycle function is used to assess the cell cycle phase scores in spatial transcriptomics data. It calculates S phase and G2M phase scores based on the expression of designated cell cycle-related genes and visualizes these scores in spatial and dimensionality-reduced plots. 5.9.1 Function arguments: st_obj: The input Seurat object containing the data for analysis. save_path: The directory where the output images will be saved. Default is the current directory. s.features: A list of genes associated with the S phase. Default is NULL (using genes from Seurat). g2m.features: A list of genes associated with the G2M phase. Default is NULL (using genes from Seurat). species: The species of the spatial transcriptomics data. Default is 'human'. FeatureColors.bi: A color palette for visualization. Default is a two-color ramp palette. 5.9.2 Function behavior: Gene Feature Assignment: Assigns S phase and G2M phase gene lists based on the specified species or provided input. Cell Cycle Scoring: Calculates the S phase and G2M phase scores in the data. Spatial Visualization: Generates spatial feature plots to visualize the S phase and G2M phase scores using the specified color palette and saves the plots as images. Dimensionality-Reduced Plot Visualization: If UMAP or tSNE dimensionality reduction is available in the st_obj, feature plots of the S phase and G2M phase scores are generated in the reduced space and saved as images. Return Value: Returns the updated st_obj containing the cell cycle phase scores. 5.9.3 An example: st_obj <- st_Cell_cycle( st_obj = st_obj, save_path = '.', s.features = NULL, g2m.features = NULL, species = 'human', FeatureColors.bi = colorRampPalette(colors = rev(x = brewer.pal(n = 11, name = 'RdYlBu'))) ) 5.9.4 Outputs: Figures showing S scores. Figures showing S scores. 5.10 Step 10. Niche analysis The st_NicheAnalysis function is designed to perform niche analysis on spatial transcriptomics data, enabling the exploration of spatial niches or microenvironments within the tissue. The function encompasses co-occurrence analysis, niche clustering, and niche interaction analysis to uncover the spatial relationships and characteristics of different cell populations or features. 5.10.1 Function arguments: st_obj: The input SeuratObject containing the spatial transcriptomics data for analysis. features: A vector of features representing features (for example, cell types from deconvolution) for niche analysis. save_path: The directory where the analysis results and visualizations will be saved. Default is the current directory. coexistence.method: The method for co-occurrence analysis, accepting 'correlation' or 'Wasserstein'. Default is 'correlation'. kmeans.n: The number of clusters for niche clustering. Default is 4. st_data_path: A path containing the ‘spatial’ file and ‘filtered_feature_bc_matrix.h5’ file, required for niche interaction visualization. slice: The slice to be used for analysis. Default is 'slice1'. species: The species of the sample data. Default is 'mouse'. pythonPath: The path to the Python environment containing Commot to use for the analysis. Default is ‘.’. 5.10.2 Function behavior: Co-occurrence Score Calculation: Calculates the co-occurrence scores between the specified features using the chosen coexistence method (‘correlation’ or ‘Wasserstein’). Niche Clustering: Utilizes k-means clustering to identify distinct spatial niches based on the expression profiles of the selected features and visualizes the clustering results. Niche Interaction Visualization: If the st_data_path is provided, performs niche interaction visualization using Commot, which is based on the provided spatial transcriptomics data and generates visualizations of niche interactions within the tissue. Return Value: Returns the updated st_obj with niche analysis results and visualizations. 5.10.3 An example: tmp <- read.csv('path/to/cell2loc_res.csv', row.names = 1) features <- colnames(tmp) if(!all(features %in% names(st_obj@meta.data))){ common.barcodes <- intersect(colnames(st_obj), rownames(tmp)) tmp <- tmp[common.barcodes, ] st_obj <- st_obj[, common.barcodes] st_obj <- AddMetaData(st_obj, metadata = tmp) } st_obj <- st_NicheAnalysis( st_obj, features = features, save_path = '.', coexistence.method = 'correlation', kmeans.n = 4, st_data_path = 'path/to/data', slice = `slice1`, species = 'human', condaenv = 'path/to/python' ) 5.10.4 Outputs: Figures showing the co-existence results. Figures showing the spatial distribution of each niche. Figures showing the composition of each niche. Figures showing the results of interactions using Commot. "]] diff --git a/docs/installation.html b/docs/installation.html index 28eb725..0764ced 100644 --- a/docs/installation.html +++ b/docs/installation.html @@ -335,7 +335,9 @@

2.4 Install required R-packages

+BiocManager::install("clusterProfiler") +install.packages("doMC") +install.packages("doRNG")
  • From CRAN
@@ -359,7 +361,8 @@

2.4 Install required R-packagesUsage limitations: Sometimes an API rate limit error occurs, and a GitHub token is needed to provide the GitHub API rate limit. The steps to resolve this are as follows: Register for an account or log in to an existing account on the GitHub website. Then click on your profile picture in the top right corner, go to the dropdown menu and select “Settings.” Next, find “Developer settings” and click on it, then find “Personal access tokens (classic).” Click on it, then click “Create new token (classic).” Create a new token by first naming it anything you like. Then choose the expiration time for the token. Finally, check the “repo” box; the token will be used to download code repositories from GitHub. Click “Generate token.” Copy the generated token password.

After that, set the token in the environment variable in R. Since we are using conda, enter R by typing R in the terminal. Then, enter the command: usethis::edit_r_environ(). This will open a file. Press the i key to edit. Paste the token you copied into the code area as follows: GITHUB_TOKEN=“your_token”.

Then press Esc, type :wq! (force save). After that, you need to exit Linux and re-enter R. Close and reopen the terminal to apply the environment variable. Reopen Linux, activate the conda environment, and enter R again.

-
devtools::install_github("sqjin/CellChat@9e1e605")
+
devtools::install_github("sqjin/CellChat")
+devtools::install_github("immunogenomics/presto")
 devtools::install_github("aertslab/SCENIC@fde9774")
 devtools::install_github("pzhulab/abcCellmap@f44c14b")
 devtools::install_github("navinlabcode/copykat@d7d6569")
@@ -385,7 +388,7 @@ 

2.5 Install required Python-packa
  • Install required packages
-
pip install stereopy==1.3.1 anndata==0.9.2 arboreto==0.1.6 cell2location==0.1.3 commot==0.0.3 karateclub==1.2.2 matplotlib==3.7.1 networkx==3.1 numpy==1.23.5 pandas==1.5.3 phate==1.0.11 pot==0.9.1 scanpy==1.9.6 scipy==1.10.1 scvelo==0.3.2 scvi-tools==0.20.3 seaborn==0.12.2
+
pip install stereopy==1.3.1 anndata==0.9.2 arboreto==0.1.6 cell2location==0.1.3 commot==0.0.3 karateclub==1.2.2 matplotlib==3.7.1 networkx==3.1 numpy==1.23.5 pandas==1.5.3 phate==1.0.11 pot==0.9.1 scanpy==1.9.6 scipy==1.10.1 scvelo==0.3.2 scvi-tools==0.20.3 seaborn==0.12.2 distributed==2024.2.1 dask-expr==0.5.3

2.6 The installed packages with versions

@@ -436,7 +439,7 @@

2.6 The installed packages with v carData 3.0-5 caret 6.0-94 caTools 1.18.2 -CellChat 1.5.0 +CellChat 2.0.1 cellranger 1.1.0 circlize 0.4.16 class 7.3-22 @@ -477,6 +480,8 @@

2.6 The installed packages with v diffobj 0.3.5 digest 0.6.36 dlm 1.1-6 +doMC 1.3.8 +doRNG 1.8.6 doBy 4.6.22 docopt 0.7.1 doParallel 1.0.17 @@ -664,6 +669,7 @@

2.6 The installed packages with v polyclip 1.10-6 polynom 1.4-1 praise 1.0.0 +presto 1.0.0 prettyunits 1.2.0 princurve 2.1.6 pROC 1.18.5 @@ -875,13 +881,13 @@

2.6 The installed packages with v croniter 1.4.1 cycler 0.12.1 dask 2024.7.0 -dask-expr 1.1.8 +dask-expr 0.5.3 dateutils 0.6.12 decorator 4.4.2 deepdiff 7.0.1 Deprecated 1.2.14 deprecation 2.1.0 -distributed 2024.7.0 +distributed 2024.2.1 dm-tree 0.1.8 dnspython 2.6.1 docrep 0.3.2 diff --git a/docs/search_index.json b/docs/search_index.json index 2cd688c..d020b54 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1 +1 @@ -[["index.html", "HemaScope Tutorial 1 Introduction", " HemaScope Tutorial HemaScope team 2024-09-27 1 Introduction HemaScope is a specialized bioinformatics toolkit designed for analyzing both single-cell and spatial transcriptome sequencing data from hematopoietic cells, including myeloid and lymphoid lineages. We have developed an R package named HemaScopeR, a Shiny interface named HemaScopeShiny, and a cloud platform named HemaScopeCloud. This tutorial introduces how to install and use the R package and Shiny interface, as well as how to access and operate the cloud platform. "],["installation.html", "2 Installation 2.1 Create a new conda environment and activate it 2.2 Set the channels in conda 2.3 Install R and python 2.4 Install required R-packages 2.5 Install required Python-packages 2.6 The installed packages with versions", " 2 Installation 2.1 Create a new conda environment and activate it conda create --name HemaScope_env conda activate HemaScope_env 2.2 Set the channels in conda # Add the default channel conda config --add channels defaults # Add default channel URLs conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 # Add custom channels conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch-lts conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/simpleitk conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/deepmodeling # Set to show channel URLs conda config --set show_channel_urls true 2.3 Install R and python R 4.3.3 and python 3.8.19 conda install R-base=4.3.3 conda install python=3.8.19 2.4 Install required R-packages From conda conda install -c conda-forge r-devtools=2.4.5 conda install -c conda-forge r-Seurat=4.3.0.1 conda install -c conda-forge r-Rfast2=0.1.5.1 conda install -c conda-forge r-hdf5r=1.3.10 conda install -c conda-forge r-ggpubr=0.6.0 conda install pwwang::r-seuratwrappers conda install -c bioconda bioconductor-monocle=2.28.0 conda install -c bioconda bioconductor-slingshot=2.8.0 conda install -c bioconda bioconductor-GSVA=1.48.2 conda install -c bioconda bioconductor-org.Mm.eg.db=3.17.0 conda install -c bioconda bioconductor-org.Hs.eg.db=3.17.0 conda install -c bioconda bioconductor-scran=1.28.1 conda install -c bioconda bioconductor-AUCell=1.22.0 conda install -c bioconda bioconductor-RcisTarget=1.20.0 conda install -c bioconda bioconductor-GENIE3=1.24.0 conda install -c bioconda bioconductor-biomaRt=2.56.1 conda install -c bioconda r-velocyto.r=0.6 #conda install -c bioconda bioconductor-limma=3.56.2 Enter the R language environment We suggest users do not manually update any already installed R packages during the installation of the following R packages. R From BiocManager # BiocManager(version = "1.30.23") should already be installed as a dependency of r-seuratwrappers. # If it is not installed, please run the following code to install it. # install.packages("BiocManager",version="1.30.23") BiocManager::install("ComplexHeatmap") BiocManager::install("scmap") BiocManager::install("clusterProfiler") From CRAN remotes::install_version("shinyjs", version = "2.1.0") remotes::install_version("shiny", version = "1.8.0") remotes::install_version("shinyWidgets", version = "0.8.6") remotes::install_version("shinydashboard", version = "0.7.2") remotes::install_version("slickR", version = "0.6.0") remotes::install_version("phateR", version = "1.0.7") remotes::install_version("gelnet", version = "1.2.1") remotes::install_version("parallelDist", version = "0.2.6") remotes::install_version("kableExtra", version = "1.3.4") remotes::install_version("transport", version = "0.14-6") remotes::install_version("feather", version = "0.3.5") remotes::install_version("markdown", version = "1.13") From GitHub tips: Sometimes network connection issues may occur, resulting in an error message indicating that GitHub cannot be connected. Please try installing again when the network conditions improve. Usage limitations: Sometimes an API rate limit error occurs, and a GitHub token is needed to provide the GitHub API rate limit. The steps to resolve this are as follows: Register for an account or log in to an existing account on the GitHub website. Then click on your profile picture in the top right corner, go to the dropdown menu and select “Settings.” Next, find “Developer settings” and click on it, then find “Personal access tokens (classic).” Click on it, then click “Create new token (classic).” Create a new token by first naming it anything you like. Then choose the expiration time for the token. Finally, check the “repo” box; the token will be used to download code repositories from GitHub. Click “Generate token.” Copy the generated token password. After that, set the token in the environment variable in R. Since we are using conda, enter R by typing R in the terminal. Then, enter the command: usethis::edit_r_environ(). This will open a file. Press the i key to edit. Paste the token you copied into the code area as follows: GITHUB_TOKEN=“your_token”. Then press Esc, type :wq! (force save). After that, you need to exit Linux and re-enter R. Close and reopen the terminal to apply the environment variable. Reopen Linux, activate the conda environment, and enter R again. devtools::install_github("sqjin/CellChat@9e1e605") devtools::install_github("aertslab/SCENIC@fde9774") devtools::install_github("pzhulab/abcCellmap@f44c14b") devtools::install_github("navinlabcode/copykat@d7d6569") devtools::install_github('chris-mcginnis-ucsf/DoubletFinder@8c7f76e') devtools::install_github("mojaveazure/seurat-disk@877d4e1") Install HemaScopeR from github devtools::install_github(repo="ZhenyiWangTHU/HemaScopeR", dep = FALSE) Exist the R language environment quit() 2.5 Install required Python-packages Upgrade pip and set mirrors python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade pip pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple pip config set global.extra-index-url http://mirrors.aliyun.com/pypi/simple/ Install required packages pip install stereopy==1.3.1 anndata==0.9.2 arboreto==0.1.6 cell2location==0.1.3 commot==0.0.3 karateclub==1.2.2 matplotlib==3.7.1 networkx==3.1 numpy==1.23.5 pandas==1.5.3 phate==1.0.11 pot==0.9.1 scanpy==1.9.6 scipy==1.10.1 scvelo==0.3.2 scvi-tools==0.20.3 seaborn==0.12.2 2.6 The installed packages with versions R packages with versions Package Version ------- ------- abcCellmap 0.1.0 abind 1.4-5 annotate 1.78.0 AnnotationDbi 1.64.1 ape 5.8 aplot 0.2.3 arrow 17.0.0 askpass 1.2.0 assertthat 0.2.1 AUCell 1.22.0 backports 1.5.0 base 4.3.3 base64enc 0.1-3 beachmat 2.16.0 BH 1.84.0-0 Biobase 2.60.0 BiocFileCache 2.8.0 BiocGenerics 0.46.0 BiocManager 1.30.23 BiocNeighbors 1.18.0 BiocParallel 1.34.2 BiocSingular 1.16.0 BiocVersion 3.18.1 biocViews 1.68.1 biomaRt 2.56.1 Biostrings 2.68.1 bit 4.0.5 bit64 4.0.5 bitops 1.0-7 blob 1.2.4 bluster 1.10.0 boot 1.3-30 brew 1.0-10 brio 1.1.5 broom 1.0.6 bslib 0.7.0 cachem 1.1.0 callr 3.7.6 car 3.1-2 carData 3.0-5 caret 6.0-94 caTools 1.18.2 CellChat 1.5.0 cellranger 1.1.0 circlize 0.4.16 class 7.3-22 cli 3.6.3 clipr 0.8.0 clock 0.7.0 clue 0.3-65 cluster 2.1.6 clusterProfiler 4.10.1 coda 0.19-4.1 codetools 0.2-20 colorspace 2.1-0 combinat 0.0-8 commonmark 1.9.1 compiler 4.3.3 ComplexHeatmap 2.18.0 conquer 1.3.3 copykat 1.1.0 corrplot 0.92 cowplot 1.1.3 cpp11 0.4.7 crayon 1.5.3 credentials 2.0.1 crosstalk 1.2.1 curl 5.2.1 data.table 1.15.4 datasets 4.3.3 DBI 1.2.3 dbplyr 2.5.0 DDRTree 0.1.5 DelayedArray 0.26.6 DelayedMatrixStats 1.22.1 deldir 2.0-4 Deriv 4.1.3 desc 1.4.3 devtools 2.4.5 diagram 1.6.5 diffobj 0.3.5 digest 0.6.36 dlm 1.1-6 doBy 4.6.22 docopt 0.7.1 doParallel 1.0.17 DOSE 3.28.2 dotCall64 1.1-1 DoubletFinder 2.0.3 downlit 0.4.4 downloader 0.4 dplyr 1.1.4 dqrng 0.3.2 dynamicTreeCut 1.63-1 e1071 1.7-14 edgeR 3.42.4 ellipsis 0.3.2 enrichplot 1.22.0 evaluate 0.24.0 expm 0.999-9 fansi 1.0.6 farver 2.1.2 fastDummies 1.7.3 fastICA 1.2-4 fastmap 1.2.0 fastmatch 1.1-4 feather 0.3.5 fgsea 1.28.0 fields 16.2 filelock 1.0.3 fitdistrplus 1.1-11 FNN 1.1.4 fontawesome 0.5.2 forcats 1.0.0 foreach 1.5.2 foreign 0.8-87 formatR 1.14 fs 1.6.4 futile.logger 1.4.3 futile.options 1.0.1 future 1.33.2 future.apply 1.11.2 gelnet 1.2.1 generics 0.1.3 GENIE3 1.24.0 GenomeInfoDb 1.36.1 GenomeInfoDbData 1.2.11 GenomicRanges 1.52.0 gert 2.0.1 GetoptLong 1.0.5 ggalluvial 0.12.5 ggforce 0.4.2 ggfun 0.1.5 ggnetwork 0.5.13 ggnewscale 0.4.10 ggplot2 3.5.1 ggplotify 0.1.2 ggpubr 0.6.0 ggraph 2.2.1 ggrepel 0.9.5 ggridges 0.5.6 ggsci 3.2.0 ggsignif 0.6.4 ggtree 3.10.1 gh 1.4.1 gitcreds 0.1.2 GlobalOptions 0.1.2 globals 0.16.3 glue 1.7.0 GO.db 3.18.0 goftest 1.2-3 googleVis 0.7.3 GOSemSim 2.28.1 gower 1.0.1 gplots 3.1.3.1 graph 1.78.0 graphics 4.3.3 graphlayouts 1.1.1 grDevices 4.3.3 grid 4.3.3 gridBase 0.4-7 gridExtra 2.3 gridGraphics 0.5-1 GSEABase 1.62.0 gson 0.1.0 GSVA 1.48.2 gtable 0.3.5 gtools 3.9.5 hardhat 1.4.0 haven 2.5.4 HDF5Array 1.28.1 hdf5r 1.3.10 HDO.db 0.99.1 HemaScopeR 1.0.0 here 1.0.1 hexbin 1.28.3 highr 0.11 hms 1.1.3 HSMMSingleCell 1.20.0 htmltools 0.5.8.1 htmlwidgets 1.6.4 httpuv 1.6.15 httr 1.4.7 httr2 1.0.2 ica 1.0-3 igraph 2.0.3 ini 0.3.1 ipred 0.9-14 IRanges 2.34.1 irlba 2.3.5.1 isoband 0.2.7 iterators 1.0.14 jquerylib 0.1.4 jsonlite 1.8.8 kableExtra 1.3.4 KEGGREST 1.40.0 kernlab 0.9-32 KernSmooth 2.23-24 knitr 1.48 labeling 0.4.3 lambda.r 1.2.4 later 1.3.2 lattice 0.22-6 lava 1.7.3 lazyeval 0.2.2 leiden 0.4.3.1 leidenbase 0.1.27 lifecycle 1.0.4 limma 3.56.2 listenv 0.9.1 lme4 1.1-35.5 lmtest 0.9-40 locfit 1.5-9.9 lsei 1.3-0 lubridate 1.9.3 magrittr 2.0.3 maps 3.4.2 maptools 1.1-8 markdown 1.13 MASS 7.3-60.0.1 Matrix 1.6-5 MatrixGenerics 1.12.2 MatrixModels 0.5-3 matrixStats 1.3.0 mcmc 0.9-8 MCMCpack 1.7-0 memoise 2.0.1 metapod 1.8.0 methods 4.3.3 mgcv 1.9-1 microbenchmark 1.4.10 mime 0.12 miniUI 0.1.1.1 minqa 1.2.7 mixtools 2.0.0 ModelMetrics 1.2.2.2 modelr 0.1.11 monocle 2.28.0 munsell 0.5.1 network 1.18.2 nlme 3.1-165 nloptr 2.0.3 NMF 0.27 nnet 7.3-19 npsurv 0.5-0 numDeriv 2016.8-1.1 openssl 2.2.0 org.Hs.eg.db 3.17.0 org.Mm.eg.db 3.17.0 parallel 4.3.3 parallelDist 0.2.6 parallelly 1.37.1 patchwork 1.2.0 pbapply 1.7-2 pbkrtest 0.5.2 pcaMethods 1.92.0 phateR 1.0.7 pheatmap 1.0.12 pillar 1.9.0 pkgbuild 1.4.4 pkgconfig 2.0.3 pkgdown 2.1.0 pkgload 1.3.4 plogr 0.2.0 plotly 4.10.4 plyr 1.8.9 png 0.1-8 polyclip 1.10-6 polynom 1.4-1 praise 1.0.0 prettyunits 1.2.0 princurve 2.1.6 pROC 1.18.5 processx 3.8.4 prodlim 2024.06.25 profvis 0.3.8 progress 1.2.3 progressr 0.14.0 promises 1.3.0 proxy 0.4-27 ps 1.7.7 purrr 1.0.2 qlcMatrix 0.9.8 quantreg 5.98 qvalue 2.34.0 R.methodsS3 1.8.2 R.oo 1.26.0 R.utils 2.12.3 R6 2.5.1 ragg 1.3.2 randomForest 4.7-1.1 RANN 2.6.1 rappdirs 0.3.3 RBGL 1.76.0 RcisTarget 1.20.0 rcmdcheck 1.4.0 RColorBrewer 1.1-3 Rcpp 1.0.13 RcppAnnoy 0.0.22 RcppArmadillo 14.0.0-1 RcppEigen 0.3.4.0.0 RcppGSL 0.3.13 RcppHNSW 0.6.0 RcppParallel 5.1.6 RcppProgress 0.4.2 RcppTOML 0.2.2 RcppZiggurat 0.1.6 RCurl 1.98-1.16 readr 2.1.5 readxl 1.4.3 recipes 1.1.0 registry 0.5-1 rematch 2.0.0 rematch2 2.1.2 remotes 2.5.0 reshape2 1.4.4 reticulate 1.38.0 Rfast 2.1.0 Rfast2 0.1.5.1 rhdf5 2.44.0 rhdf5filters 1.12.1 Rhdf5lib 1.22.0 rio 1.1.1 rjson 0.2.21 rlang 1.1.4 rmarkdown 2.27 rngtools 1.5.2 ROCR 1.0-11 roxygen2 7.3.2 rpart 4.1.23 rprojroot 2.0.4 RSpectra 0.16-2 RSQLite 2.3.7 rstatix 0.7.2 rstudioapi 0.16.0 rsvd 1.0.5 Rtsne 0.17 RUnit 0.4.33 rversions 2.1.2 rvest 1.0.4 S4Arrays 1.0.4 S4Vectors 0.38.1 sass 0.4.9 ScaledMatrix 1.8.1 scales 1.3.0 scattermore 1.2 scatterpie 0.2.3 SCENIC 1.3.0 scmap 1.24.0 scran 1.28.1 sctransform 0.4.1 scuttle 1.10.1 segmented 2.1-0 selectr 0.4-2 sessioninfo 1.2.2 Seurat 4.3.0.1 SeuratDisk 0.0.0.9021 SeuratObject 5.0.2 SeuratWrappers 0.3.1 shadowtext 0.1.4 shape 1.4.6.1 shinyjs 2.1.0 shiny 1.8.0 shinyWidgets 0.8.6 shinydashboard 0.7.2 slickR 0.6.0 SingleCellExperiment 1.22.0 sitmo 2.0.2 slam 0.1-51 slingshot 2.8.0 sna 2.7-2 snow 0.4-4 sourcetools 0.1.7-1 sp 2.1-4 spam 2.10-0 SparseM 1.84 sparseMatrixStats 1.12.2 sparsesvd 0.2-2 spatstat.data 3.1-2 spatstat.explore 3.2-6 spatstat.geom 3.2-9 spatstat.random 3.2-3 spatstat.sparse 3.1-0 spatstat.univar 3.0-0 spatstat.utils 3.0-5 splines 4.3.3 SQUAREM 2021.1 statmod 1.5.0 statnet.common 4.9.0 stats 4.3.3 stats4 4.3.3 stringi 1.8.4 stringr 1.5.1 SummarizedExperiment 1.30.2 survival 3.7-0 svglite 2.1.3 sys 3.4.2 systemfonts 1.1.0 tcltk 4.3.3 tensor 1.5 testthat 3.2.1.1 textshaping 0.3.7 tibble 3.2.1 tidygraph 1.3.1 tidyr 1.3.1 tidyselect 1.2.1 tidytree 0.4.6 timechange 0.3.0 timeDate 4032.109 tinytex 0.51 tools 4.3.3 TrajectoryUtils 1.8.0 transport 0.14-6 treeio 1.26.0 tweenr 2.0.3 tzdb 0.4.0 urlchecker 1.0.1 usethis 2.2.3 utf8 1.2.4 utils 4.3.3 uwot 0.1.16 vctrs 0.6.5 velocyto.R 0.6 VGAM 1.1-11 viridis 0.6.5 viridisLite 0.4.2 vroom 1.6.5 waldo 0.5.2 webshot 0.5.5 whisker 0.4.1 withr 3.0.0 writexl 1.5.0 xfun 0.46 XML 3.99-0.17 xml2 1.3.6 xopen 1.0.1 xtable 1.8-4 XVector 0.40.0 yaml 2.3.9 yulab.utils 0.1.4 zip 2.3.1 zlibbioc 1.46.0 zoo 1.8-12 Python packages with versions Package Version ------------------------ -------------- absl-py 2.1.0 access 1.1.9 affine 2.4.0 aiohttp 3.9.5 aiosignal 1.3.1 anndata 0.10.8 annotated-types 0.7.0 anyio 4.4.0 arboreto 0.1.6 argcomplete 3.4.0 array_api_compat 1.7.1 arrow 1.3.0 attrs 23.2.0 backoff 2.2.1 beautifulsoup4 4.12.3 blessed 1.20.0 bokeh 3.5.0 boto3 1.34.145 botocore 1.34.145 cell2location 0.1.3 certifi 2024.7.4 charset-normalizer 3.3.2 chex 0.1.7 click 8.1.7 click-plugins 1.1.1 cligj 0.7.2 cloudpickle 3.0.0 commot 0.0.3 contextlib2 21.6.0 contourpy 1.2.1 croniter 1.4.1 cycler 0.12.1 dask 2024.7.0 dask-expr 1.1.8 dateutils 0.6.12 decorator 4.4.2 deepdiff 7.0.1 Deprecated 1.2.14 deprecation 2.1.0 distributed 2024.7.0 dm-tree 0.1.8 dnspython 2.6.1 docrep 0.3.2 editor 1.6.6 email_validator 2.2.0 esda 2.4.3 etils 1.9.2 fastapi 0.111.1 fastapi-cli 0.0.4 filelock 3.15.4 fiona 1.9.6 flax 0.8.5 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.6.1 future 1.0.0 gensim 4.3.3 geopandas 0.13.2 giddy 2.3.5 graphtools 1.5.3 h11 0.14.0 h5py 3.11.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 idna 3.7 igraph 0.11.6 importlib_metadata 8.0.0 importlib_resources 6.4.0 inequality 1.0.0 inquirer 3.3.0 itsdangerous 2.2.0 jax 0.4.30 jaxlib 0.4.30 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 karateclub 1.2.2 kiwisolver 1.4.5 legacy-api-wrap 1.4 leidenalg 0.10.2 Levenshtein 0.25.1 libpysal 4.7.0 lightning 2.0.9.post0 lightning-cloud 0.5.70 lightning-utilities 0.11.5 llvmlite 0.43.0 locket 1.0.0 loompy 3.0.7 lz4 4.3.3 mapclassify 2.6.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.1 mdurl 0.1.2 mgwr 2.2.1 ml_collections 0.1.1 ml-dtypes 0.4.0 momepy 0.6.0 mpmath 1.3.0 msgpack 1.0.8 mudata 0.2.4 multidict 6.0.5 multipledispatch 1.0.0 natsort 8.4.0 nest-asyncio 1.6.0 networkx 3.3 numba 0.60.0 numpy 1.26.4 numpy-groupies 0.11.1 numpyro 0.15.1 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 opencv-python 4.10.0.84 opt-einsum 3.3.0 optax 0.2.1 orbax-checkpoint 0.5.21 ordered-set 4.1.0 packaging 24.1 pandas 2.0.3 partd 1.4.2 patsy 0.5.6 phate 1.0.11 pillow 10.4.0 pip 24.1.2 platformdirs 4.2.2 plotly 5.22.0 pointpats 2.4.0 POT 0.9.4 protobuf 5.27.2 psutil 6.0.0 PuLP 2.9.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pydantic 2.1.1 pydantic_core 2.4.0 Pygments 2.18.0 PyGSP 0.5.1 PyJWT 2.8.0 pynndescent 0.5.13 pyparsing 3.0.9 pyproj 3.6.1 pyro-api 0.1.2 pyro-ppl 1.9.1 pysal 24.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-igraph 0.11.6 python-Levenshtein 0.25.1 python-louvain 0.16 python-multipart 0.0.9 pytorch-lightning 2.3.3 pytz 2024.1 PyYAML 6.0.1 quantecon 0.7.2 rapidfuzz 3.9.4 rasterio 1.3.10 rasterstats 0.19.0 readchar 4.1.0 requests 2.32.3 rich 13.7.1 Rtree 1.3.0 runs 1.2.2 s_gd2 1.8.1 s3transfer 0.10.2 scanpy 1.10.2 scikit-learn 1.5.1 scipy 1.13.1 scprep 1.2.3 scvelo 0.3.2 scvi-tools 1.1.5 seaborn 0.13.2 segregation 2.5 session_info 1.0.0 setuptools 71.0.1 shapely 2.0.5 shellingham 1.5.4 simplejson 3.19.2 six 1.16.0 smart-open 7.0.4 sniffio 1.3.1 snuggs 1.4.7 sortedcontainers 2.4.0 soupsieve 2.5 spaghetti 1.7.4 sparse 0.15.4 spglm 1.0.8 spint 1.0.7 splot 1.1.5.post1 spopt 0.5.0 spreg 1.4 spvcm 0.3.0 starlette 0.37.2 starsessions 1.3.0 statsmodels 0.14.1 stdlib-list 0.10.0 sympy 1.13.1 tasklogger 1.2.0 tblib 3.0.0 tenacity 8.5.0 tensorstore 0.1.63 texttable 1.7.0 threadpoolctl 3.5.0 tobler 0.11.2 toml 0.10.2 tomlkit 0.13.0 toolz 0.12.1 torch 2.3.1 torchmetrics 1.4.0.post0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 triton 2.3.1 typer 0.12.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 umap-learn 0.5.6 urllib3 2.2.2 uvicorn 0.30.1 uvloop 0.19.0 watchfiles 0.22.0 wcwidth 0.2.13 websocket-client 1.8.0 websockets 12.0 wheel 0.43.0 wrapt 1.16.0 xarray 2024.6.0 xmltodict 0.13.0 xmod 1.8.1 xyzservices 2024.6.0 yarl 1.9.4 yq 3.4.3 zict 3.0.0 zipp 3.19.2 "],["integrated-scrna-seq-pipeline.html", "3 Integrated scRNA-seq pipeline", " 3 Integrated scRNA-seq pipeline Load the R packages. # sc libraries library(Seurat) library(phateR) library(DoubletFinder) library(monocle) library(slingshot) library(URD) library(GSVA) library(limma) library(plyr) library(dplyr) library(org.Mm.eg.db) library(org.Hs.eg.db) library(CellChat) library(velocyto.R) library(SeuratWrappers) library(stringr) library(scran) library(ggpubr) library(viridis) library(pheatmap) library(parallel) library(reticulate) library(SCENIC) library(feather) library(AUCell) library(RcisTarget) library(Matrix) library(foreach) library(doParallel) library(clusterProfiler) library(OpenXGR) # st libraries library(RColorBrewer) library(Rfast2) library(SeuratDisk) library(abcCellmap) library(biomaRt) library(copykat) library(gelnet) library(ggplot2) library(parallelDist) library(patchwork) library(markdown) # getpot library(getopt) library(tools) # HemaScopeR library(HemaScopeR) Run the integrated scRNA-seq pipeline. scRNASeq_10x_pipeline( # input and output input.data.dirs = c('./SRR7881399/outs/filtered_feature_bc_matrix', './SRR7881400/outs/filtered_feature_bc_matrix', './SRR7881401/outs/filtered_feature_bc_matrix', './SRR7881402/outs/filtered_feature_bc_matrix', './SRR7881403/outs/filtered_feature_bc_matrix', './SRR7881404/outs/filtered_feature_bc_matrix', './SRR7881405/outs/filtered_feature_bc_matrix', './SRR7881406/outs/filtered_feature_bc_matrix', './SRR7881407/outs/filtered_feature_bc_matrix', './SRR7881408/outs/filtered_feature_bc_matrix', './SRR7881409/outs/filtered_feature_bc_matrix', './SRR7881410/outs/filtered_feature_bc_matrix', './SRR7881411/outs/filtered_feature_bc_matrix', './SRR7881412/outs/filtered_feature_bc_matrix', './SRR7881413/outs/filtered_feature_bc_matrix', './SRR7881414/outs/filtered_feature_bc_matrix', './SRR7881415/outs/filtered_feature_bc_matrix', './SRR7881416/outs/filtered_feature_bc_matrix', './SRR7881417/outs/filtered_feature_bc_matrix', './SRR7881418/outs/filtered_feature_bc_matrix', './SRR7881419/outs/filtered_feature_bc_matrix', './SRR7881420/outs/filtered_feature_bc_matrix', './SRR7881421/outs/filtered_feature_bc_matrix', './SRR7881422/outs/filtered_feature_bc_matrix', './SRR7881423/outs/filtered_feature_bc_matrix'), project.names = c( 'SRR7881399', 'SRR7881400', 'SRR7881401', 'SRR7881402', 'SRR7881403', 'SRR7881404', 'SRR7881405', 'SRR7881406', 'SRR7881407', 'SRR7881408', 'SRR7881409', 'SRR7881410', 'SRR7881411', 'SRR7881412', 'SRR7881413', 'SRR7881414', 'SRR7881415', 'SRR7881416', 'SRR7881417', 'SRR7881418', 'SRR7881419', 'SRR7881420', 'SRR7881421', 'SRR7881422', 'SRR7881423'), output.dir = './output/', pythonPath = '/home/anaconda3/envs/HemaScopeR/bin/python', # quality control and preprocessing gene.column = 2, min.cells = 10, min.feature = 200, mt.pattern = '^MT-', nFeature_RNA.limit = 200, percent.mt.limit = 20, scale.factor = 10000, nfeatures = 3000, ndims = 50, vars.to.regress = NULL, PCs = 1:35, resolution = 0.4, n.neighbors = 50, # remove doublets doublet.percentage = 0.04, doublerFinderwraper.PCs = 1:20, doublerFinderwraper.pN = 0.25, doublerFinderwraper.pK = 0.1, # phateR phate.knn = 50, phate.npca = 20, phate.t = 10, phate.ndim = 2, min.pct = 0.25, logfc.threshold = 0.25, # visualization ViolinPlot.cellTypeOrders = as.character(1:22), ViolinPlot.cellTypeColors = NULL, Org = 'hsa', loom.files.path = c( './SRR7881399/velocyto/SRR7881399.loom', './SRR7881400/velocyto/SRR7881400.loom', './SRR7881401/velocyto/SRR7881401.loom', './SRR7881402/velocyto/SRR7881402.loom', './SRR7881403/velocyto/SRR7881403.loom', './SRR7881404/velocyto/SRR7881404.loom', './SRR7881405/velocyto/SRR7881405.loom', './SRR7881406/velocyto/SRR7881406.loom', './SRR7881407/velocyto/SRR7881407.loom', './SRR7881408/velocyto/SRR7881408.loom', './SRR7881409/velocyto/SRR7881409.loom', './SRR7881410/velocyto/SRR7881410.loom', './SRR7881411/velocyto/SRR7881411.loom', './SRR7881412/velocyto/SRR7881412.loom', './SRR7881413/velocyto/SRR7881413.loom', './SRR7881414/velocyto/SRR7881414.loom', './SRR7881415/velocyto/SRR7881415.loom', './SRR7881416/velocyto/SRR7881416.loom', './SRR7881417/velocyto/SRR7881417.loom', './SRR7881418/velocyto/SRR7881418.loom', './SRR7881419/velocyto/SRR7881419.loom', './SRR7881420/velocyto/SRR7881420.loom', './SRR7881421/velocyto/SRR7881421.loom', './SRR7881422/velocyto/SRR7881422.loom', './SRR7881423/velocyto/SRR7881423.loom'), # cell cycle cellcycleCutoff = NULL, # cell chat sorting = FALSE, ncores = 10, # Verbose = FALSE, # activeEachStep Whether_load_previous_results = FALSE, Step1_Input_Data = TRUE, Step1_Input_Data.type = 'cellranger-count', Step2_Quality_Control = TRUE, Step2_Quality_Control.RemoveBatches = TRUE, Step2_Quality_Control.RemoveDoublets = TRUE, Step3_Clustering = TRUE, Step4_Identify_Cell_Types = TRUE, Step4_Use_Which_Labels = 'clustering', Step4_Cluster_Labels = NULL, Step4_Changed_Labels = NULL, Step4_run_sc_CNV = TRUE, Step5_Visualization = TRUE, Step6_Find_DEGs = TRUE, Step7_Assign_Cell_Cycle = TRUE, Step8_Calculate_Heterogeneity = TRUE, Step9_Violin_Plot_for_Marker_Genes = TRUE, Step10_Calculate_Lineage_Scores = TRUE, Step11_GSVA = TRUE, Step11_GSVA.identify.cellType.features=TRUE, Step11_GSVA.identify.diff.features=FALSE, Step11_GSVA.comparison.design=NULL, Step12_Construct_Trajectories = TRUE, Step12_Construct_Trajectories.clusters = c('3','6','9','10','11','14','15','19'), Step12_Construct_Trajectories.monocle = TRUE, Step12_Construct_Trajectories.slingshot = TRUE, Step12_Construct_Trajectories.scVelo = TRUE, Step13_TF_Analysis = TRUE, Step14_Cell_Cell_Interaction = TRUE, Step15_Generate_the_Report = TRUE ) "],["step-by-step-scrna-seq-pipeline.html", "4 Step-by-step scRNA-seq Pipeline 4.1 Step 1. Load the R packages and the input data 4.2 Step 2. Quality Control 4.3 Step 3. Clustering 4.4 Step 4. Identify Cell Types 4.5 Step 5. Visualization 4.6 Step 6. Find DEGs 4.7 Step 7. Assign Cell Cycles 4.8 Step 8. Calculate Heterogeneity 4.9 Step 9. Violin Plot for Marker Genes 4.10 Step 10. Calculate Lineage Scores 4.11 Step 11. GSVA 4.12 Step 12. Construct Trajectories 4.13 Step 13. TF Analysis 4.14 Step 14. Cell-Cell Interaction", " 4 Step-by-step scRNA-seq Pipeline 4.1 Step 1. Load the R packages and the input data Load the R packages. # sc libraries library(Seurat) library(phateR) library(DoubletFinder) library(monocle) library(slingshot) library(URD) library(GSVA) library(limma) library(plyr) library(dplyr) library(org.Mm.eg.db) library(org.Hs.eg.db) library(CellChat) library(velocyto.R) library(SeuratWrappers) library(stringr) library(scran) library(ggpubr) library(viridis) library(pheatmap) library(parallel) library(reticulate) library(SCENIC) library(feather) library(AUCell) library(RcisTarget) library(Matrix) library(foreach) library(doParallel) library(clusterProfiler) library(OpenXGR) # st libraries library(RColorBrewer) library(Rfast2) library(SeuratDisk) library(abcCellmap) library(biomaRt) library(copykat) library(gelnet) library(ggplot2) library(parallelDist) library(patchwork) library(markdown) # getpot library(getopt) library(tools) # HemaScopeR library(HemaScopeR) Set the paths for the input data, the output results, and the Python installation. input.data.dirs = c('./SRR7881399/outs/filtered_feature_bc_matrix', './SRR7881400/outs/filtered_feature_bc_matrix', './SRR7881401/outs/filtered_feature_bc_matrix', './SRR7881402/outs/filtered_feature_bc_matrix', './SRR7881403/outs/filtered_feature_bc_matrix', './SRR7881404/outs/filtered_feature_bc_matrix', './SRR7881405/outs/filtered_feature_bc_matrix', './SRR7881406/outs/filtered_feature_bc_matrix', './SRR7881407/outs/filtered_feature_bc_matrix', './SRR7881408/outs/filtered_feature_bc_matrix', './SRR7881409/outs/filtered_feature_bc_matrix', './SRR7881410/outs/filtered_feature_bc_matrix', './SRR7881411/outs/filtered_feature_bc_matrix', './SRR7881412/outs/filtered_feature_bc_matrix', './SRR7881413/outs/filtered_feature_bc_matrix', './SRR7881414/outs/filtered_feature_bc_matrix', './SRR7881415/outs/filtered_feature_bc_matrix', './SRR7881416/outs/filtered_feature_bc_matrix', './SRR7881417/outs/filtered_feature_bc_matrix', './SRR7881418/outs/filtered_feature_bc_matrix', './SRR7881419/outs/filtered_feature_bc_matrix', './SRR7881420/outs/filtered_feature_bc_matrix', './SRR7881421/outs/filtered_feature_bc_matrix', './SRR7881422/outs/filtered_feature_bc_matrix', './SRR7881423/outs/filtered_feature_bc_matrix') output.dir = './output/' pythonPath = '/home/anaconda3/envs/HemaScopeR/bin/python' Set the parameters for loading the data sets. project.names = c('SRR7881399', 'SRR7881400', 'SRR7881401', 'SRR7881402', 'SRR7881403', 'SRR7881404', 'SRR7881405', 'SRR7881406', 'SRR7881407', 'SRR7881408', 'SRR7881409', 'SRR7881410', 'SRR7881411', 'SRR7881412', 'SRR7881413', 'SRR7881414', 'SRR7881415', 'SRR7881416', 'SRR7881417', 'SRR7881418', 'SRR7881419', 'SRR7881420', 'SRR7881421', 'SRR7881422', 'SRR7881423') gene.column = 2 min.cells = 10 min.feature = 200 mt.pattern = '^MT-' Step1_Input_Data.type = 'cellranger-count' Create folders for saving the results of HemaScopeR analysis. wdir <- getwd() if(is.null(pythonPath)==FALSE){ reticulate::use_python(pythonPath) }else{print('Please set the path of Python.')} if (!file.exists(paste0(output.dir, '/HemaScopeR_results/'))) { dir.create(paste0(output.dir, '/HemaScopeR_results/')) } output.dir <- paste0(output.dir,'/HemaScopeR_results/') if (!file.exists(paste0(output.dir, '/RDSfiles/'))) { dir.create(paste0(output.dir, '/RDSfiles/')) } previous_results_path <- paste0(output.dir, '/RDSfiles/') # if (Whether_load_previous_results) { # print('Loading the previous results...') # Load_previous_results(previous_results_path = previous_results_path) # } # Step1. Input data----------------------------------------------------------------------------- print('Step1. Input data.') if (!file.exists(paste0(output.dir, '/Step1.Input_data/'))) { dir.create(paste0(output.dir, '/Step1.Input_data/')) } Load the data sets. file.copy(from = input.data.dirs, to = paste0(output.dir,'/Step1.Input_data/'), recursive = TRUE) if(Step1_Input_Data.type == 'cellranger-count'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_data.temp <- Read10X(data.dir = input.data.dirs[i], gene.column = gene.column) sc_object.temp <- CreateSeuratObject(counts = sc_data.temp, project = project.names[i], min.cells = min.cells, min.feature = min.feature) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp)} }else{ sc_data <- Read10X(data.dir = input.data.dirs, gene.column = gene.column) sc_object <- CreateSeuratObject(counts = sc_data, project = project.names, min.cells = min.cells, min.feature = min.feature) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else if(Step1_Input_Data.type == 'Seurat'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_object.temp <- readRDS(input.data.dirs[i]) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp) } }else{ sc_object <- readRDS(input.data.dirs) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else if(Step1_Input_Data.type == 'Matrix'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_data.temp <- readRDS(input.data.dirs[i]) sc_object.temp <- CreateSeuratObject(counts = sc_data.temp, project = project.names[i], min.cells = min.cells, min.feature = min.feature) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp)} }else{ sc_data <- readRDS(input.data.dirs) sc_object <- CreateSeuratObject(counts = sc_data, project = project.names, min.cells = min.cells, min.feature = min.feature) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else{ stop('Please input data generated by the cellranger-count software, or a Seurat object, or a gene expression matrix. HemaScopeR does not support other formats of input data.') } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.2 Step 2. Quality Control Set the parameters for quality control. # quality control and preprocessing nFeature_RNA.limit = 200 percent.mt.limit = 20 scale.factor = 10000 nfeatures = 3000 ndims = 50 vars.to.regress = NULL PCs = 1:35 resolution = 0.4 n.neighbors = 50 # remove doublets doublet.percentage = 0.04 doublerFinderwraper.PCs = 1:20 doublerFinderwraper.pN = 0.25 doublerFinderwraper.pK = 0.1 Step2_Quality_Control.RemoveBatches = TRUE Step2_Quality_Control.RemoveDoublets = TRUE Create a folder for saving the results of quality control. print('Step2. Quality control.') if (!file.exists(paste0(output.dir, '/Step2.Quality_control/'))) { dir.create(paste0(output.dir, '/Step2.Quality_control/')) } Run the quality control process. if(length(input.data.dirs) > 1){ # preprocess and quality control for multiple scRNA-Seq data sets sc_object <- QC_multiple_scRNASeq(seuratObjects = input.data.list, datasetID = project.names, output.dir = paste0(output.dir,'/Step2.Quality_control/'), Step2_Quality_Control.RemoveBatches = Step2_Quality_Control.RemoveBatches, Step2_Quality_Control.RemoveDoublets = Step2_Quality_Control.RemoveDoublets, nFeature_RNA.limit = nFeature_RNA.limit, percent.mt.limit = percent.mt.limit, scale.factor = scale.factor, nfeatures = nfeatures, ndims = ndims, vars.to.regress = vars.to.regress, PCs = PCs, resolution = resolution, n.neighbors = n.neighbors, percentage = doublet.percentage, doublerFinderwraper.PCs = doublerFinderwraper.PCs, doublerFinderwraper.pN = doublerFinderwraper.pN, doublerFinderwraper.pK = doublerFinderwraper.pK ) }else{ # preprocess and quality control for single scRNA-Seq data set sc_object <- QC_single_scRNASeq(sc_object = sc_object, datasetID = project.names, output.dir = paste0(output.dir,'/Step2.Quality_control/'), Step2_Quality_Control.RemoveDoublets = Step2_Quality_Control.RemoveDoublets, nFeature_RNA.limit = nFeature_RNA.limit, percent.mt.limit = percent.mt.limit, scale.factor = scale.factor, nfeatures = nfeatures, vars.to.regress = vars.to.regress, ndims = ndims, PCs = PCs, resolution = resolution, n.neighbors = n.neighbors, percentage = doublet.percentage, doublerFinderwraper.PCs = doublerFinderwraper.PCs, doublerFinderwraper.pN = doublerFinderwraper.pN, doublerFinderwraper.pK = doublerFinderwraper.pK) } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.3 Step 3. Clustering Set the parameters for clustering. PCs = 1:35 resolution = 0.4 n.neighbors = 50 Create a folder for saving the results of Louvain clustering. print('Step3. Clustering.') if (!file.exists(paste0(output.dir, '/Step3.Clustering/'))) { dir.create(paste0(output.dir, '/Step3.Clustering/')) } Run Louvian clustering. if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){graph.name <- 'integrated_snn'}else{graph.name <- 'RNA_snn'} sc_object <- FindNeighbors(sc_object, dims = PCs, k.param = n.neighbors, force.recalc = TRUE) sc_object <- FindClusters(sc_object, resolution = resolution, graph.name = graph.name) sc_object@meta.data$seurat_clusters <- as.character(as.numeric(sc_object@meta.data$seurat_clusters)) # plot clustering pdf(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','tsne_cluster.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "tsne", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() pdf(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','umap_cluster.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','tsne_cluster.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "tsne", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','umap_cluster.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "umap", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.4 Step 4. Identify Cell Types Set the path for the database. databasePath = "~/HemaScopeR/database/" Set the parameters for cell type identification. Step4_Use_Which_Labels = 'clustering' Step4_Cluster_Labels = NULL Step4_Changed_Labels = NULL Org = 'hsa' ncores = 10 Create a folder for saving the results of cell type identification. print('Step4. Identify cell types automatically.') if (!file.exists(paste0(output.dir, '/Step4.Identify_Cell_Types/'))) { dir.create(paste0(output.dir, '/Step4.Identify_Cell_Types/')) } Run the cell type identification process and the copy number variation (CNV) analysis. sc_object <- run_cell_annotation(object = sc_object, assay = 'RNA', species = Org, output.dir = paste0(output.dir,'/Step4.Identify_Cell_Types/')) if(Org == 'hsa'){ load(paste0(databasePath,"/HematoMap.reference.rdata")) if(length(intersect(rownames(HematoMap.reference), rownames(sc_object))) < 1000){ HematoMap.reference <- RenameGenesSeurat(obj = HematoMap.reference, newnames = toupper(rownames(HematoMap.reference)), gene.use = rownames(HematoMap.reference), de.assay = "RNA", lassays = "RNA") } if(sc_object@active.assay == 'integrated'){ DefaultAssay(sc_object) <- 'RNA' sc_object <- mapDataToRef(ref_object = HematoMap.reference, ref_labels = HematoMap.reference@meta.data$CellType, query_object = sc_object, PCs = PCs, output.dir = paste0(output.dir, '/Step4.Identify_Cell_Types/')) DefaultAssay(sc_object) <- 'integrated' }else{ sc_object <- mapDataToRef(ref_object = HematoMap.reference, ref_labels = HematoMap.reference@meta.data$CellType, query_object = sc_object, PCs = PCs, output.dir = paste0(output.dir, '/Step4.Identify_Cell_Types/')) } } Set the cell labels. # set the cell labels if(Step4_Use_Which_Labels == 'clustering'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$seurat_clusters Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.1'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$Seurat.RNACluster Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.2'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$scmap.RNACluster Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.3'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$Seurat.Immunophenotype Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.4'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$scmap.Immunophenotype Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'HematoMap'){ if(Org == 'hsa'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$predicted.id Idents(sc_object) <- sc_object@meta.data$selectLabels }else{print("'HematoMap' is only applicable to human data ('Org' = 'hsa').")} }else if(Step4_Use_Which_Labels == 'changeLabels'){ if (!is.null(Step4_Cluster_Labels) && !is.null(Step4_Changed_Labels) && length(Step4_Cluster_Labels) == length(Step4_Changed_Labels)){ sc_object@meta.data$selectLabels <- plyr::mapvalues(sc_object@meta.data$seurat_clusters, from = as.character(Step4_Cluster_Labels), to = as.character(Step4_Changed_Labels), warn_missing = FALSE) Idents(sc_object) <- sc_object@meta.data$selectLabels }else{ print("Please input the 'Step4_Cluster_Labels' parameter as Seurat clustering labels, and the 'Step4_Changed_Labels' parameter as new labels. Please note that these two parameters should be of equal length.") } }else{ print('Please set the "Step4_Use_Which_Labels" parameter as "clustering", "abcCellmap.1", "abcCellmap.2", "HematoMap" or "changeLabels".') } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } Run the CNV analysis. sc_CNV(sc_object=sc_object, save_path=paste0(output.dir,'/Step4.Identify_Cell_Types/'), assay = 'RNA', LOW.DR = 0.05, UP.DR = 0.1, win.size = 25, distance = "euclidean", genome = NULL, n.cores = ncores, species = Org) 4.5 Step 5. Visualization Create a folder for saving the visualization results. print('Step5. Visualization.') if (!file.exists(paste0(output.dir, '/Step5.Visualization/'))) { dir.create(paste0(output.dir, '/Step5.Visualization/')) } The statistical results for the numbers and proportions of cell groups. # statistical results cells_labels <- as.data.frame(cbind(rownames(sc_object@meta.data), as.character(sc_object@meta.data$selectLabels))) colnames(cells_labels) <- c('cell_id', 'cluster_id') cluster_counts <- cells_labels %>% group_by(cluster_id) %>% summarise(count = n()) total_cells <- nrow(cells_labels) cluster_counts <- cluster_counts %>% mutate(proportion = count / total_cells) cluster_counts <- as.data.frame(cluster_counts) cluster_counts$percentages <- scales::percent(cluster_counts$proportion, accuracy = 0.1) cluster_counts <- cluster_counts[,-which(colnames(cluster_counts)=='proportion')] cluster_counts$cluster_id_count_percentages <- paste(cluster_counts$cluster_id, " (", cluster_counts$count, ' cells; ', cluster_counts$percentages, ")", sep='') cluster_counts <- cluster_counts[order(cluster_counts$count, decreasing = TRUE),] cluster_counts <- rbind(cluster_counts, c('Total', sum(cluster_counts$count), '100%', 'all cells')) sc_object@meta.data$cluster_id_count_percentages <- mapvalues(sc_object@meta.data$selectLabels, from=cluster_counts$cluster_id, to=cluster_counts$cluster_id_count_percentages, warn_missing=FALSE) colnames(sc_object@meta.data)[which(colnames(sc_object@meta.data) == 'cluster_id_count_percentages')] <- paste('Total ', nrow(sc_object@meta.data), ' cells', sep='') cluster_counts <- cluster_counts[,-which(colnames(cluster_counts)=='cluster_id_count_percentages')] colnames(cluster_counts) <- c('Cell types', 'Cell counts', 'Percentages') # names(colorvector) <- mapvalues(names(colorvector), # from=cluster_counts$cluster_id, # to=cluster_counts$cluster_id_count_percentages, # warn_missing=FALSE) write.csv(cluster_counts, file=paste(paste0(output.dir, '/Step5.Visualization/'), '/cell types_cell counts_percentages.csv', sep=''), quote=FALSE, row.names=FALSE) The UMAP visualization. pdf(paste(paste0(output.dir, '/Step5.Visualization/'), '/cell types_cell counts_percentages_umap.pdf', sep=''), width = 14, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = paste('Total ', nrow(sc_object@meta.data), ' cells', sep=''), label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Set the parameters for phateR. phate.knn = 50 phate.npca = 20 phate.t = 10 phate.ndim = 2 Run phateR for dimensional reduction and visualization. # run phateR if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object) <- 'integrated' }else{ DefaultAssay(sc_object) <- 'RNA'} if(!is.null(pythonPath)){ run_phateR(sc_object = sc_object, output.dir = paste0(output.dir,'/Step5.Visualization/'), pythonPath = pythonPath, phate.knn = phate.knn, phate.npca = phate.npca, phate.t = phate.t, phate.ndim = phate.ndim) } Perform visualization using UMAP and TSNE. # plot cell types pdf(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','tsne cell types.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "tsne", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() pdf(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','umap cell types.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','tsne cell types.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "tsne", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','umap cell types.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "umap", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.6 Step 6. Find DEGs Set the parameters for identifying differentially expressed genes. min.pct = 0.25 logfc.threshold = 0.25 Create a folder for the DEGs analysis. print('Step6. Find DEGs.') if (!file.exists(paste0(output.dir, '/Step6.Find_DEGs/'))) { dir.create(paste0(output.dir, '/Step6.Find_DEGs/')) } Identify DEGs using Wilcoxon Rank-Sum Test. sc_object.markers <- FindAllMarkers(sc_object, only.pos = TRUE, min.pct = min.pct, logfc.threshold = logfc.threshold) write.csv(sc_object.markers, file = paste0(paste0(output.dir, '/Step6.Find_DEGs/'),'sc_object.markerGenes.csv'), quote=FALSE) Set the parameters for GPTCelltype. your_openai_API_key = '' tissuename = 'human bone marrow' gptmodel = 'gpt-3.5' Use GPTCelltype to assist cell type annotation. GPT_annotation( marker.genes = sc_object.markers, your_openai_API_key = your_openai_API_key, tissuename = tissuename, gptmodel = gptmodel, output.dir = paste0(output.dir, '/Step6.Find_DEGs/')) Perform GO and KEGG enrichment. # GO enrichment if(Org=='mmu'){ OrgDb <- 'org.Mm.eg.db' }else if(Org=='hsa'){ OrgDb <- 'org.Hs.eg.db' }else{ stop("Org should be 'mmu' or 'hsa'.") } HemaScopeREnrichment(DEGs=sc_object.markers, OrgDb=OrgDb, output.dir=paste0(output.dir, '/Step6.Find_DEGs/')) sc_object.markers.top5 <- sc_object.markers %>% group_by(cluster) %>% top_n(n = 5, wt = avg_log2FC) pdf(paste0(paste0(output.dir, '/Step6.Find_DEGs/'), 'sc_object_markerGenesTop5.pdf'), width = 0.5*length(unique(sc_object.markers.top5$gene)), height = 0.5*length(unique(Idents(sc_object)))) print(DotPlot(sc_object, features = unique(sc_object.markers.top5$gene), cols=c("lightgrey",'red'))+theme(axis.text.x =element_text(angle = 45, vjust = 1, hjust = 1))) dev.off() png(paste0(paste0(output.dir, '/Step6.Find_DEGs/'), 'sc_object_markerGenesTop5.png'), width = 20*length(unique(sc_object.markers.top5$gene)), height = 30*length(unique(Idents(sc_object)))) print(DotPlot(sc_object, features = unique(sc_object.markers.top5$gene), cols=c("lightgrey",'red'))+theme(axis.text.x =element_text(angle = 45, vjust = 1, hjust = 1))) dev.off() Create a folder for saving the results of gene network analysis. if (!file.exists(paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/'))) { dir.create(paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/')) } Perform gene network analysis. OpenXGR_SAG(sc_object.markers = sc_object.markers, output.dir = paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/'), subnet.size = 10) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.7 Step 7. Assign Cell Cycles Create a folder for saving the results of cell cycle analysis. print('Step7. Assign cell cycles.') if (!file.exists(paste0(output.dir, '/Step7.Assign_cell_cycles/'))) { dir.create(paste0(output.dir, '/Step7.Assign_cell_cycles/')) } Set the parameters for the cell cycle analysis. cellcycleCutoff = NULL Run the cell cycle analysis. datasets.before.batch.removal <- readRDS(paste0(paste0(output.dir, '/RDSfiles/'),'datasets.before.batch.removal.rds')) sc_object <- cellCycle(sc_object=sc_object, counts_matrix = GetAssayData(object = datasets.before.batch.removal, slot = "counts")%>%as.matrix(), data_matrix = GetAssayData(object = datasets.before.batch.removal, slot = "data")%>%as.matrix(), cellcycleCutoff = cellcycleCutoff, cellTypeOrders = unique(sc_object@meta.data$selectLabels), output.dir=paste0(output.dir, '/Step7.Assign_cell_cycles/'), databasePath = databasePath, Org = Org) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.8 Step 8. Calculate Heterogeneity Create a folder for saving the results of heterogeneity calculation. print('Step8. Calculate heterogeneity.') if (!file.exists(paste0(output.dir, '/Step8.Calculate_heterogeneity/'))) { dir.create(paste0(output.dir, '/Step8.Calculate_heterogeneity/')) } Run heterogeneity calculation process. expression_matrix <- GetAssayData(object = datasets.before.batch.removal, slot = "data")%>%as.matrix() expression_matrix <- expression_matrix[,rownames(sc_object@meta.data)] cell_types_groups <- as.data.frame(cbind(sc_object@meta.data$selectLabels, sc_object@meta.data$datasetID)) colnames(cell_types_groups) <- c('clusters', 'datasetID') if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } heterogeneity(expression_matrix = expression_matrix, cell_types_groups = cell_types_groups, cellTypeOrders = cellTypes_orders, output.dir = paste0(output.dir, '/Step8.Calculate_heterogeneity/')) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.9 Step 9. Violin Plot for Marker Genes Create a folder for saving the violin plots of marker genes. print('Step9. Violin plot for marker genes.') if (!file.exists(paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/'))) { dir.create(paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/')) } Run violin plot visualization. if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object) <- 'integrated' }else{ DefaultAssay(sc_object) <- 'RNA'} dataMatrix <- GetAssayData(object = sc_object, slot = "scale.data") if(is.null(marker.genes)&(Org == 'mmu')){ # mpp genes are from 'The bone marrow microenvironment at single cell resolution' # the other genes are from 'single cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis' # the aliases of these genes were changed in gecodeM16:Gpr64 -> Adgrg2, Sdpr -> Cavin2, Hbb-b1 -> Hbb-bs, Sfpi1 -> Spi1 HSC_lineage_signatures <- c('Slamf1', 'Itga2b', 'Kit', 'Ly6a', 'Bmi1', 'Gata2', 'Hlf', 'Meis1', 'Mpl', 'Mcl1', 'Gfi1', 'Gfi1b', 'Hoxb5') Mpp_genes <- c('Mki67', 'Mpo', 'Elane', 'Ctsg', 'Calr') Erythroid_lineage_signatures <- c('Klf1', 'Gata1', 'Mpl', 'Epor', 'Vwf', 'Zfpm1', 'Fhl1', 'Adgrg2', 'Cavin2','Gypa', 'Tfrc', 'Hbb-bs', 'Hbb-y') Lymphoid_lineage_signatures <- c('Tcf3', 'Ikzf1', 'Notch1', 'Flt3', 'Dntt', 'Btg2', 'Tcf7', 'Rag1', 'Ptprc', 'Ly6a', 'Blnk') Myeloid_lineage_signatures <- c('Gfi1', 'Spi1', 'Mpo', 'Csf2rb', 'Csf1r', 'Gfi1b', 'Hk3', 'Csf2ra', 'Csf3r', 'Sp1', 'Fcgr3') marker.genes <- c(HSC_lineage_signatures, Mpp_genes, Erythroid_lineage_signatures, Lymphoid_lineage_signatures, Myeloid_lineage_signatures) }else if(is.null(marker.genes)&(Org == 'hsa')){ HSPCs_lineage_signatures <- c('CD34','KIT','AVP','FLT3','MME','CD7','CD38','CSF1R','FCGR1A','MPO','ELANE','IL3RA') Myeloids_lineage_signatures <- c('LYZ','CD36','MPO','FCGR1A','CD4','CD14','CD300E','ITGAX','FCGR3A','FLT3','AXL', 'SIGLEC6','CLEC4C','IRF4','LILRA4','IL3RA','IRF8','IRF7','XCR1','CD1C','THBD', 'MRC1','CD34','KIT','ITGA2B','PF4','CD9','ENG','KLF','TFRC') B_cells_lineage_signatures <- c('CD79A','IGLL1','RAG1','RAG2','VPREB1','MME','IL7R','DNTT','MKI67','PCNA','TCL1A','MS4A1','IGHD','CD27','IGHG3') T_NK_cells_lineage_signatures <- c('CD3D','CD3E','CD8A','CCR7','IL7R','SELL','KLRG1','CD27','GNLY', 'NKG7','PDCD1','TNFRSF9','LAG3','CD160','CD4','CD40LG','IL2RA', 'FOXP3','DUSP4','IL2RB','KLRF1','FCGR3A','NCAM1','XCL1','MKI67','PCNA','KLRF') marker.genes <- c(HSPCs_lineage_signatures, Myeloids_lineage_signatures, B_cells_lineage_signatures, T_NK_cells_lineage_signatures) } if(is.null(ViolinPlot.cellTypeOrders)){ ViolinPlot.cellTypeOrders <- unique(sc_object@meta.data$selectLabels) } if(is.null(ViolinPlot.cellTypeColors)){ ViolinPlot.cellTypeColors <- viridis::viridis(length(unique(sc_object@meta.data$selectLabels))) } combinedViolinPlot(dataMatrix = dataMatrix, features = marker.genes, CellTypes = sc_object@meta.data$selectLabels, cellTypeOrders = ViolinPlot.cellTypeOrders, cellTypeColors = ViolinPlot.cellTypeColors, Org = Org, output.dir = paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/'), databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.10 Step 10. Calculate Lineage Scores Create a folder for saving the results of lineage score calculation. print('Step10. Calculate lineage scores.') # we use normalized data here if (!file.exists(paste0(output.dir, '/Step10.Calculate_lineage_scores/'))) { dir.create(paste0(output.dir, '/Step10.Calculate_lineage_scores/')) } Run lineage score calculation. if(is.null(lineage.genelist)&is.null(lineage.names)&(Org == 'mmu')){ lineage.genelist <- c(list(HSC_lineage_signatures), list(Mpp_genes), list(Erythroid_lineage_signatures), list(Lymphoid_lineage_signatures), list(Myeloid_lineage_signatures)) lineage.names <- c('HSC_lineage_signatures', 'Mpp_genes', 'Erythroid_lineage_signatures', 'Lymphoid_lineage_signatures', 'Myeloid_lineage_signatures') }else if(is.null(lineage.genelist)&is.null(lineage.names)&(Org == 'hsa')){ lineage.genelist <- c(list(HSPCs_lineage_signatures), list(Myeloids_lineage_signatures), list(B_cells_lineage_signatures), list(T_NK_cells_lineage_signatures)) lineage.names <- c('HSPCs_lineage_signatures', 'Myeloids_lineage_signatures', 'B_cells_lineage_signatures', 'T_NK_cells_lineage_signatures') } if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } lineageScores(expression_matrix = expression_matrix, cellTypes = sc_object@meta.data$selectLabels, cellTypes_orders = cellTypes_orders, cellTypes_colors = ViolinPlot.cellTypeColors, groups = sc_object@meta.data$datasetID, groups_orders = unique(sc_object@meta.data$datasetID), groups_colors = groups_colors, lineage.genelist = lineage.genelist, lineage.names = lineage.names, Org = Org, output.dir = paste0(output.dir, '/Step10.Calculate_lineage_scores/'), databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.11 Step 11. GSVA Create a folder for saving the results of GSVA. print('Step11. GSVA.') if (!file.exists(paste0(output.dir, '/Step11.GSVA/'))) { dir.create(paste0(output.dir, '/Step11.GSVA/')) } Run GSVA. setwd(wdir) if(Org=='mmu'){ load(paste0(databasePath,"/mouse_c2_v5p2.rdata")) GSVA.genelist <- Mm.c2 assign('OrgDB', org.Mm.eg.db) }else if(Org=='hsa'){ load(paste0(databasePath,"/human_c2_v5p2.rdata")) GSVA.genelist <- Hs.c2 assign('OrgDB', org.Hs.eg.db) }else{ stop("Org should be 'mmu' or 'hsa'.") } if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } run_GSVA(sc_object = sc_object, GSVA.genelist = GSVA.genelist, GSVA.cellTypes = sc_object@meta.data$selectLabels, GSVA.cellTypes.orders = cellTypes_orders, GSVA.cellGroups = sc_object@meta.data$datasetID, GSVA.identify.cellType.features = Step11_GSVA.identify.cellType.features, GSVA.identify.diff.features = Step11_GSVA.identify.diff.features, GSVA.comparison.design = Step11_GSVA.comparison.design, OrgDB = OrgDB, output.dir = paste0(output.dir, '/Step11.GSVA/')) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.12 Step 12. Construct Trajectories Load gene symbols and ensemble IDs. DefaultAssay(sc_object) <- 'RNA' countsSlot <- GetAssayData(object = sc_object, slot = "counts") gene_metadata <- as.data.frame(rownames(countsSlot)) rownames(gene_metadata) <- gene_metadata[,1] if(Org == 'mmu'){ load(paste0(databasePath,"/mouseGeneSymbolandEnsembleID.rdata")) gene_metadata $ ensembleID <- mapvalues(x = gene_metadata[,1], from = mouseGeneSymbolandEnsembleID$geneName, to = mouseGeneSymbolandEnsembleID$ensemblIDNoDot, warn_missing = FALSE) }else if(Org == 'hsa'){ load(paste0(databasePath,"/humanGeneSymbolandEnsembleID.rdata")) gene_metadata $ ensembleID <- mapvalues(x = gene_metadata[,1], from = humanGeneSymbolandEnsembleID$geneName, to = humanGeneSymbolandEnsembleID$ensemblIDNoDot, warn_missing = FALSE) } colnames(gene_metadata) <- c('gene_short_name','ensembleID') Create folders for saving the results of trajectory construction. print('Step12. Construct trajectories.') if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/monocle2/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/monocle2/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/slingshot/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/slingshot/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/scVelo/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/scVelo/')) } Prepare the input data. if(is.null(Step12_Construct_Trajectories.clusters)){ sc_object.subset <- sc_object countsSlot.subset <- GetAssayData(object = sc_object.subset, slot = "counts") }else{ sc_object.subset <- subset(sc_object, subset = selectLabels %in% Step12_Construct_Trajectories.clusters) countsSlot.subset <- GetAssayData(object = sc_object.subset, slot = "counts") } Run monocle2. # monocle2 phenoData <- sc_object.subset@meta.data featureData <- gene_metadata run_monocle(cellData = countsSlot.subset, phenoData = phenoData, featureData = featureData, lowerDetectionLimit = 0.5, expressionFamily = VGAM::negbinomial.size(), cellTypes='selectLabels', monocle.orders=Step12_Construct_Trajectories.clusters, monocle.colors = ViolinPlot.cellTypeColors, output.dir = paste0(output.dir, '/Step12.Construct_trajectories/monocle2/')) Run slingshot. # slingshot if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object.subset) <- 'integrated' }else{ DefaultAssay(sc_object.subset) <- 'RNA'} run_slingshot(slingshot.PCAembeddings = Embeddings(sc_object.subset, reduction = "pca")[, PCs], slingshot.cellTypes = sc_object.subset@meta.data$selectLabels, slingshot.start.clus = slingshot.start.clus, slingshot.end.clus = slingshot.end.clus, slingshot.colors = slingshot.colors, output.dir = paste0(output.dir, '/Step12.Construct_trajectories/slingshot/')) Run scVelo. # scVelo if((!is.null(loom.files.path))&(!is.null(pythonPath))){ prepareDataForScvelo(sc_object = sc_object.subset, loom.files.path = loom.files.path, scvelo.reduction = 'pca', scvelo.column = 'selectLabels', output.dir = paste0(output.dir, '/Step12.Construct_trajectories/scVelo/')) reticulate::py_run_string(paste0("import os\\noutputDir = '", output.dir, "'")) reticulate::py_run_file(file.path(system.file(package = "HemaScopeR"), "python/sc_run_scvelo.py"), convert = FALSE) } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.13 Step 13. TF Analysis Create folders for saving the results of TF analysis. print('Step13. TF analysis.') if (!file.exists(paste0(output.dir, '/Step13.TF_analysis/'))) { dir.create(paste0(output.dir, '/Step13.TF_analysis/')) } Run SCENIC to perform TF analysis. run_SCENIC(countMatrix = countsSlot, cellTypes = sc_object@meta.data$selectLabels, datasetID = sc_object@meta.data$datasetID, cellTypes_colors = Step13_TF_Analysis.cellTypes_colors, cellTypes_orders = unique(sc_object@meta.data$selectLabels), groups_colors = Step13_TF_Analysis.groups_colors, groups_orders = unique(sc_object@meta.data$datasetID), Org = Org, output.dir = paste0(output.dir, '/Step13.TF_analysis/'), pythonPath = pythonPath, databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.14 Step 14. Cell-Cell Interaction Create folders for saving the results of cell-cell interaction analysis. print('Step14. Cell-cell interaction.') if (!file.exists(paste0(output.dir, '/Step14.Cell_cell_interection/'))) { dir.create(paste0(output.dir, '/Step14.Cell_cell_interection/')) } Run CellChat to perform cell-cell interaction analysis. tempwd <- getwd() run_CellChat(data.input=countsSlot, labels = sc_object@meta.data$selectLabels, cell.orders = ViolinPlot.cellTypeOrders, cell.colors = ViolinPlot.cellTypeColors, sample.names = rownames(sc_object@meta.data), Org = Org, sorting = sorting, output.dir = paste0(output.dir, '/Step14.Cell_cell_interection/')) setwd(tempwd) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } "],["stey-by-step-st-seq-pipeline.html", "5 Stey-by-step st-seq pipeline 5.1 Step 1. Data loading 5.2 Step 2. QC 5.3 Step 3. Clustering 5.4 Step 4. DEGs 5.5 Step 5. Spatially variable features 5.6 Step 6. Spatial interaction 5.7 Step 7. CNV analysis 5.8 Step 8. Deconvolution 5.9 Step 9. Cell cycle 5.10 Step 10. Niche analysis", " 5 Stey-by-step st-seq pipeline 5.1 Step 1. Data loading The st_Loading_Data function is designed for loading 10X Visium spatial transcriptomics data from Space Ranger. It will load data from input.data.dir and output it in the SeuratOjbect format. 5.1.1 Function arguments: input.data.dir: The directory where the input data is stored. output.dir: The directory where the processed output will be saved. If not specified, the output is saved in the current working directory. Default is ‘.’. sampleName: A string naming the sample. Default is ‘Hema_ST’. rds.file: A boolean indicating if the input data is in RDS file format rather than a typical results of Space Ranger. Default is FALSE. filename: The name of the file to be loaded if the data is not in RDS format. Default is “filtered_feature_bc_matrix.h5”. assay: The specific assay to apply to the data. Default is ‘Spatial’. slice: The image slice identifier for the spatial data. Default is ‘slice1’. filter.matrix: A boolean indicating whether to load filtered matrix. Default is TRUE. to.upper: A boolean indicating whether to convert feature names to upper form. Default is FALSE. 5.1.2 Funciton behavior: Directory Creation: The function first checks if the output.dir exists; if not, it creates it. RDS File Handling: If rds.file is TRUE, it reads the RDS file, ensuring the specified assay and slice are present in the Seurat object. Non-RDS File Handling: If rds.file is FALSE, it loads the data using Load10X_Spatial from Seurat. Saving the Object: Uses SaveH5Seurat and Convert to save the Seurat object in rds and h5ad formats. File Copying: Copies any necessary files (filter matrix, spatial image) to the output.dir. Return Value: Returns the processed Seurat object. 5.1.3 An example: st_obj <- st_Loading_Data( input.data.dir = 'path/to/data', output.dir = '.', sampleName = 'Hema_ST, rds.file = FALSE, filename = 'filtered_feature_bc_matrix.h5', assay = 'Spatial', slice = 'slice1', filter.matrix = TRUE, to.upper = FALSE ) 5.1.4 Outputs: Spatial transcriptome data in rds and h5ad formats 5.2 Step 2. QC The QC_Spatial function performs basic quality control on a SeuratObject containing 10X visium data and returns the filtered SeuratObject. It provides options to set thresholds for the number of genes, nUMI (unique molecular identifiers), and spots expressing each gene. It also allows for the removal of mitochondrial genes based on species. 5.2.1 Function arguments: st_obj: A SeuratObject of 10X visium data. output.dir: A character string specifying the path to store the results and figures. Default is the current working directory. min.gene: An integer representing the minimum number of genes detected in a spot. Default is 200. max.gene: An integer representing the maximum number of genes detected in a spot. Default is Inf (no upper limit). min.nUMI: An integer representing the minimum number of nUMI detected in a spot. Default is 500. max.nUMI: An integer representing the maximum number of nUMI detected in a spot. Default is Inf (no upper limit). min.spot: An integer representing the minimum number of spots expressing each gene. Default is 3. species: A character string representing the species of sample, either ‘human’ or ‘mouse’. bool.remove.mito: A boolean value indicating whether to remove mitochondrial genes. Default is TRUE. SpatialColors: A function that interpolates a set of given colors to create new color palettes and color ramps. Default is a color palette with reversed Spectral colors from RColorBrewer. 5.2.2 Function behavior: Plots and saves the spatial distribution of nUMI and nGene. Plots and saves violin plots for nUMI and nGene. Identifies and marks low-quality spots based on nUMI and nGene thresholds. Plots the spatial distribution of quality. Plots and saves a histogram for the number of spots expressing each gene. Plots the spatial distribution of mitochondrial genes. Saves the raw SeuratObject before filtering. Removes low-quality spots and genes with fewer occurrences. Optionally removes mitochondrial genes. Saves the filtered SeuratObject. Returns the filtered st_obj. 5.2.3 An example: st_obj <- QC_Spatial( st_obj = st_obj, output.dir = '.', min.gene = 200, min.nUMI = Inf, max.gene = 500, max.nUMI = Inf, min.spot = 3, species = 'human', bool.remove.mito = TRUE, SpatialColors = colorRampPalette(colors = rev(x = brewer.pal(n = 11, name = "Spectral"))) ) 5.2.4 Outputs: Figures showing the spatial distribution of nUMI and nGene. Violin plots of nUMI and nGene. Figures showing the quality. Histograms for the number of spots expressing each gene. Figures showing the spatial distribution of mitochondrial genes. Raw and filtered SeuratObject. 5.3 Step 3. Clustering The st_Clustering function is designed to perform clustering analysis on spatial transcriptomics data. It integrates several key steps including data normalization, dimensionality reduction, clustering, and visualization. The function saves the results and visualizations to output.dir. 5.3.1 Function arguments: st_obj: The input spatial transcriptomics seurat object that contains the data to be clustered. output.dir: The directory where the output files will be saved. Default is the current directory (‘.’). normalization.method: The method used for data normalization. Default is ‘SCTransform’. npcs: The number of principal components to use in PCA. Default is 50. pcs.used: The principal components to use for clustering. Default is the first 10 PCs (1:10). resolution: The resolution parameter for the clustering algorithm. Default is 0.8. verbose: A logical flag to print progress messages. Default is FALSE. 5.3.2 Function behavior: Data Normalization and PCA: Depending on the normalization.method, the function either uses SCTransform or a standard normalization method followed by scaling and variable feature detection. Performs PCA on the normalized data. Clustering and Dimensionality Reduction: Finds nearest neighbors using the specified principal components (pcs.used). Identifies clusters using the specified resolution. Performs UMAP and t-SNE for visualization of the clusters. Visualization: Generates spatial, UMAP, and t-SNE plots of the clusters with customized color schemes. Saves these plots as images in the specified directory. Saving Results: Saves the updated st_obj as an RDS file. Exports the metadata of st_obj to a CSV file. Return Value: Returns the updated st_obj containing the clustering results. 5.3.3 An example: st_obj <- st_Clustering( st_obj = st_obj, output.dir = '.', normalization.method = 'SCTransform', npcs = 50, pcs.used = 1:10, resolution = 0.8, verbose = FALSE ) 5.3.4 Outputs: Figures showing the results of clustering. SeuratObject in rds format. 5.4 Step 4. DEGs The st_Find_DEGs function is designed to identify differentially expressed genes (DEGs) in spatial transcriptomics data. It performs differential expression analysis based on clustering results, visualizes the top markers, and saves the results to output.dir. 5.4.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for DEG analysis. output.dir: The directory where output files will be saved. Default is the current directory (‘.’). ident.label: The metadata label used for identifying clusters. Default is 'seurat_clusters'. only.pos: A logical flag to include only positive markers. Default is TRUE. min.pct: The minimum fraction of cells expressing the gene in either cluster. Default is 0.25. logfc.threshold: The log fold change threshold for considering a gene differentially expressed. Default is 0.25. test.use: The statistical test to use for differential expression analysis. Default is 'wilcox'. verbose: A logical flag to print progress messages. Default is FALSE. 5.4.2 Function behavior: Set Identifiers: Sets the cluster identifiers in the spatial transcriptomics object (st_obj) based on the specified ident.label. Find Differentially Expressed Genes (DEGs): Performs differential expression analysis using the specified parameters (only.pos, min.pct, logfc.threshold, test.use). Top Marker Genes: Selects the top 5 marker genes for each cluster based on the highest average log fold change. Visualization: Generates a dot plot for the top DEGs and saves the plot as an image in the specified directory. Saving Results: Saves the DEG results as a CSV file. Return Value: Returns the data frame containing the identified DEGs. 5.4.3 An example: st.markers <- st_Find_DEGs( st_obj = st_obj, output.dir = '.', ident.label = 'seurat_clusters', only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, test.use = 'wilcox', verbose = FALSE ) 5.4.4 Outputs: Dot plots showing markers. CSV file containing the information of markers. 5.5 Step 5. Spatially variable features The st_SpatiallyVariableFeatures function identifies and visualizes spatially variable features (SVFs) in spatial transcriptomics data. It integrates the identification of spatially variable features using a specified method, saves the results to a directory, and creates visualizations of the top spatially variable features. 5.5.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for analysis. output.dir: The directory where output files will be saved. Default is the current directory. assay: The assay to be used for finding spatially variable features. Default is 'SCT'. selection.method: The method used for selecting spatially variable features. Default is 'moransi'. n.top.show: The number of top spatially variable features to visualize. Default is 10. n.col: The number of columns for the visualization grid. Default is 5. verbose: A logical flag to print progress messages. Default is FALSE. 5.5.2 Function behavior: Identify Spatially Variable Features: Identifies spatially variable features using the specified method and assay. Suppresses warnings during the process. Save Metadata: Extracts metadata features and saves them as a CSV file in output.dir. Visualization: Selects the top n.top.show spatially variable features. Generates and saves a spatial feature plot of these features in the specified directory. Return Value: Returns the updated st_obj containing the identified spatially variable features. 5.5.3 An example: st_obj <- st_SpatiallyVariableFeatures( st_obj = st_obj, output.dir = '.', assay = st_obj@active.assay, selection.method = 'moransi', n.top.show = 10, n.col = 5, verbose = FALSE ) 5.5.4 Outputs: Figures showing SVFs. CSV file containing the information of SVFs. 5.6 Step 6. Spatial interaction The st_Interaction function is used to identify and visualize interactions between clusters based on spatial transcriptomics data. It utilizes Commot to analyze spatial interactions, identify pathway activities, and assess the strength and significance of interactions. 5.6.1 Function arguments: st_data_path: Path to the spatial transcriptomics data. metadata_path: Path to the metadata associated with the spatial transcriptomics data. library_id: Identifier for the spatial transcriptomics library. Default is 'Hema_ST'. label_key: Key in the metadata to identify cell clusters. Default is 'seurat_clusters'. save_path: The directory where output files will be saved. Default is the current directory. species: The species of the spatial transcriptomics data. Default is 'human'. signaling_type: Type of signaling interactions to consider. Default is 'Secreted Signaling'. database: Database to be used for the analysis. Default is 'CellChat'. min_cell_pct: Minimum percentage of cells to consider for interaction analysis. Default is 0.05. dis_thr: Distance threshold for defining interactions. Default is 500. n_permutations: Number of permutations for assessing significance. Default is 100. pythonPath: The path to the Python environment containing Commot to use for the analysis. Default is ‘.’. 5.6.2 Function behavior: Commot Analysis: Uses Commot to perform interaction analysis, identifying interactions within and between clusters. Visualization: Generates visualizations of pathway interactions and interactions between ligand-receptors (LRs) within and between clusters, and saves them in save_path. 5.6.3 An example: st_Interaction( st_data_path = 'path/to/data', metadata_path = 'path/to/metadata', library_id = 'Hema_ST', label_key = 'seurat_clusters', save_path = '.', species = 'human', signaling_type = 'Secreted Signaling', database = 'CellChat', min_cell_pct = 0.05, dis_thr = 500, n_permutations = 100, pythonPath = 'path/to/python' ) 5.6.4 Outputs: Dot plot showing pathway interaction between and within clusters. Dot plot showing LRs interaction between and within clusters. The information of each LR and pathway. 5.7 Step 7. CNV analysis The st_CNV function identifies and visualizes copy number variations (CNVs) in spatial transcriptomics data. It uses CopyKAT to perform the CNV analysis, saves the results, and generates visual representations of CNV states. 5.7.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for analysis. save_path: The directory where output files will be saved. assay: The assay to be used for CNV analysis. Default is 'Spatial'. LOW.DR: The lower threshold for the dropout rate in CopyKAT. Default is 0.05. UP.DR: The upper threshold for the dropout rate in CopyKAT. Default is 0.1. win.size: The window size for the CNV analysis. Default is 25. distance: The distance metric to be used for the analysis. Default is \"euclidean\". genome: The genome version to be used, ‘hg20’ or ‘mm10’. Default is \"hg20\". n.cores: The number of cores to be used for parallel processing. Default is 1. species: The species of the spatial transcriptomics data. Default is 'human'. 5.7.2 Function behavior: CopyKAT Analysis: Runs CopyKAT pipeline to perform CNV analysis using the provided parameters. Saving Results: Saves the CopyKAT results as an RDS file. Plotting: Generates plots of the CNV states and saves them in save_path. Updating Metadata: Updates the spatial transcriptomics object with CNV state metadata. Return Value: Returns the updated st_obj containing the CNV state information. 5.7.3 An example: st_obj <- st_CNV( st_obj = st_obj, save_path = '.', assay = 'Spatial', LOW.DR = 0.05, UP.DR = 0.1, win.size = 25, distance = "euclidean", genome = 'hg20', n.cores = 1, species = 'human' ) 5.7.4 Outputs: Figures showing the predicted CNV states. Figures showing the CNV heatmap. rds files of results of copykat. 5.8 Step 8. Deconvolution The st_Deconvolution function aims to perform spatial deconvolution analysis on spatial transcriptomics data to estimate the cell-type composition and abundance in different regions. The function utilizes cell2location to infer cell-type abundance and spatial distributions, allowing for the visualization and interpretation of spatially resolved cell populations within the tissue. 5.8.1 Function arguments: st.data.dir: Path to the spatial transcriptomics data. sc.h5ad.dir: Path to the single-cell RNA-seq data in h5ad format. Default is NULL. library_id: Identifier for the spatial transcriptomics library. Default is 'Hema_ST'. st_obj: Spatial transcriptomics object containing the data for analysis. Default is NULL. save_path: The directory where output files will be saved. Default is NULL. sc.labels.key: Key in the single-cell metadata to identify cell clusters. Default is 'seurat_clusters'. species: The species of the spatial transcriptomics data. Default is 'mouse'. sc.max.epoch: Maximum number of epochs used for single-cell deconvolution. Default is 1000. st.max.epoch: Maximum number of epochs used for spatial deconvolution. Default is 10000. use.gpu: Logical value indicating whether to use GPU for computation. Default is FALSE. use.Dataset: The dataset to be used for analysis, such as 'HematoMap' or 'LymphNode'. pythonPath: The path to the Python environment containing cell2location to use for the analysis. Default is ‘.’. 5.8.2 Function behavior: Deconvolution Analysis: Performs the spatial deconvolution analysis using the provided spatial transcriptomics and single-cell RNA-seq data. Post-Analysis Processing: Processes the deconvolution results and visualizes the spatial distribution of inferred cell types within the tissue. Returning Results: If a Seurat object is provided, the updated Seurat object with cell type information is returned. 5.8.3 An example: st_obj <- st_Deconvolution( st.data.dir = 'path/to/data', library_id = 'Hema_ST', sc.h5ad.dir = NULL, st_obj = st_obj, save_path = '.', sc.labels.key = 'seurat_clusters', species = 'human', sc.max.epoch = 1000, st.max.epoch = 10000, use.gpu = FALSE, use.Dataset = 'LymphNode', pythonPath = 'path/to/python' ) 5.8.4 Outputs: Figures showing the predicted abundance of each cell-type. The parameters of trained cell2location model. 5.9 Step 9. Cell cycle The st_Cell_cycle function is used to assess the cell cycle phase scores in spatial transcriptomics data. It calculates S phase and G2M phase scores based on the expression of designated cell cycle-related genes and visualizes these scores in spatial and dimensionality-reduced plots. 5.9.1 Function arguments: st_obj: The input Seurat object containing the data for analysis. save_path: The directory where the output images will be saved. Default is the current directory. s.features: A list of genes associated with the S phase. Default is NULL (using genes from Seurat). g2m.features: A list of genes associated with the G2M phase. Default is NULL (using genes from Seurat). species: The species of the spatial transcriptomics data. Default is 'human'. FeatureColors.bi: A color palette for visualization. Default is a two-color ramp palette. 5.9.2 Function behavior: Gene Feature Assignment: Assigns S phase and G2M phase gene lists based on the specified species or provided input. Cell Cycle Scoring: Calculates the S phase and G2M phase scores in the data. Spatial Visualization: Generates spatial feature plots to visualize the S phase and G2M phase scores using the specified color palette and saves the plots as images. Dimensionality-Reduced Plot Visualization: If UMAP or tSNE dimensionality reduction is available in the st_obj, feature plots of the S phase and G2M phase scores are generated in the reduced space and saved as images. Return Value: Returns the updated st_obj containing the cell cycle phase scores. 5.9.3 An example: st_obj <- st_Cell_cycle( st_obj = st_obj, save_path = '.', s.features = NULL, g2m.features = NULL, species = 'human', FeatureColors.bi = colorRampPalette(colors = rev(x = brewer.pal(n = 11, name = 'RdYlBu'))) ) 5.9.4 Outputs: Figures showing S scores. Figures showing S scores. 5.10 Step 10. Niche analysis The st_NicheAnalysis function is designed to perform niche analysis on spatial transcriptomics data, enabling the exploration of spatial niches or microenvironments within the tissue. The function encompasses co-occurrence analysis, niche clustering, and niche interaction analysis to uncover the spatial relationships and characteristics of different cell populations or features. 5.10.1 Function arguments: st_obj: The input SeuratObject containing the spatial transcriptomics data for analysis. features: A vector of features representing features (for example, cell types from deconvolution) for niche analysis. save_path: The directory where the analysis results and visualizations will be saved. Default is the current directory. coexistence.method: The method for co-occurrence analysis, accepting 'correlation' or 'Wasserstein'. Default is 'correlation'. kmeans.n: The number of clusters for niche clustering. Default is 4. st_data_path: A path containing the ‘spatial’ file and ‘filtered_feature_bc_matrix.h5’ file, required for niche interaction visualization. slice: The slice to be used for analysis. Default is 'slice1'. species: The species of the sample data. Default is 'mouse'. pythonPath: The path to the Python environment containing Commot to use for the analysis. Default is ‘.’. 5.10.2 Function behavior: Co-occurrence Score Calculation: Calculates the co-occurrence scores between the specified features using the chosen coexistence method (‘correlation’ or ‘Wasserstein’). Niche Clustering: Utilizes k-means clustering to identify distinct spatial niches based on the expression profiles of the selected features and visualizes the clustering results. Niche Interaction Visualization: If the st_data_path is provided, performs niche interaction visualization using Commot, which is based on the provided spatial transcriptomics data and generates visualizations of niche interactions within the tissue. Return Value: Returns the updated st_obj with niche analysis results and visualizations. 5.10.3 An example: tmp <- read.csv('path/to/cell2loc_res.csv', row.names = 1) features <- colnames(tmp) if(!all(features %in% names(st_obj@meta.data))){ common.barcodes <- intersect(colnames(st_obj), rownames(tmp)) tmp <- tmp[common.barcodes, ] st_obj <- st_obj[, common.barcodes] st_obj <- AddMetaData(st_obj, metadata = tmp) } st_obj <- st_NicheAnalysis( st_obj, features = features, save_path = '.', coexistence.method = 'correlation', kmeans.n = 4, st_data_path = 'path/to/data', slice = `slice1`, species = 'human', condaenv = 'path/to/python' ) 5.10.4 Outputs: Figures showing the co-existence results. Figures showing the spatial distribution of each niche. Figures showing the composition of each niche. Figures showing the results of interactions using Commot. "]] +[["index.html", "HemaScope Tutorial 1 Introduction", " HemaScope Tutorial HemaScope team 2024-09-27 1 Introduction HemaScope is a specialized bioinformatics toolkit designed for analyzing both single-cell and spatial transcriptome sequencing data from hematopoietic cells, including myeloid and lymphoid lineages. We have developed an R package named HemaScopeR, a Shiny interface named HemaScopeShiny, and a cloud platform named HemaScopeCloud. This tutorial introduces how to install and use the R package and Shiny interface, as well as how to access and operate the cloud platform. "],["installation.html", "2 Installation 2.1 Create a new conda environment and activate it 2.2 Set the channels in conda 2.3 Install R and python 2.4 Install required R-packages 2.5 Install required Python-packages 2.6 The installed packages with versions", " 2 Installation 2.1 Create a new conda environment and activate it conda create --name HemaScope_env conda activate HemaScope_env 2.2 Set the channels in conda # Add the default channel conda config --add channels defaults # Add default channel URLs conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r conda config --add default_channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 # Add custom channels conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch-lts conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/simpleitk conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/deepmodeling # Set to show channel URLs conda config --set show_channel_urls true 2.3 Install R and python R 4.3.3 and python 3.8.19 conda install R-base=4.3.3 conda install python=3.8.19 2.4 Install required R-packages From conda conda install -c conda-forge r-devtools=2.4.5 conda install -c conda-forge r-Seurat=4.3.0.1 conda install -c conda-forge r-Rfast2=0.1.5.1 conda install -c conda-forge r-hdf5r=1.3.10 conda install -c conda-forge r-ggpubr=0.6.0 conda install pwwang::r-seuratwrappers conda install -c bioconda bioconductor-monocle=2.28.0 conda install -c bioconda bioconductor-slingshot=2.8.0 conda install -c bioconda bioconductor-GSVA=1.48.2 conda install -c bioconda bioconductor-org.Mm.eg.db=3.17.0 conda install -c bioconda bioconductor-org.Hs.eg.db=3.17.0 conda install -c bioconda bioconductor-scran=1.28.1 conda install -c bioconda bioconductor-AUCell=1.22.0 conda install -c bioconda bioconductor-RcisTarget=1.20.0 conda install -c bioconda bioconductor-GENIE3=1.24.0 conda install -c bioconda bioconductor-biomaRt=2.56.1 conda install -c bioconda r-velocyto.r=0.6 #conda install -c bioconda bioconductor-limma=3.56.2 Enter the R language environment We suggest users do not manually update any already installed R packages during the installation of the following R packages. R From BiocManager # BiocManager(version = "1.30.23") should already be installed as a dependency of r-seuratwrappers. # If it is not installed, please run the following code to install it. # install.packages("BiocManager",version="1.30.23") BiocManager::install("ComplexHeatmap") BiocManager::install("scmap") BiocManager::install("clusterProfiler") install.packages("doMC") install.packages("doRNG") From CRAN remotes::install_version("shinyjs", version = "2.1.0") remotes::install_version("shiny", version = "1.8.0") remotes::install_version("shinyWidgets", version = "0.8.6") remotes::install_version("shinydashboard", version = "0.7.2") remotes::install_version("slickR", version = "0.6.0") remotes::install_version("phateR", version = "1.0.7") remotes::install_version("gelnet", version = "1.2.1") remotes::install_version("parallelDist", version = "0.2.6") remotes::install_version("kableExtra", version = "1.3.4") remotes::install_version("transport", version = "0.14-6") remotes::install_version("feather", version = "0.3.5") remotes::install_version("markdown", version = "1.13") From GitHub tips: Sometimes network connection issues may occur, resulting in an error message indicating that GitHub cannot be connected. Please try installing again when the network conditions improve. Usage limitations: Sometimes an API rate limit error occurs, and a GitHub token is needed to provide the GitHub API rate limit. The steps to resolve this are as follows: Register for an account or log in to an existing account on the GitHub website. Then click on your profile picture in the top right corner, go to the dropdown menu and select “Settings.” Next, find “Developer settings” and click on it, then find “Personal access tokens (classic).” Click on it, then click “Create new token (classic).” Create a new token by first naming it anything you like. Then choose the expiration time for the token. Finally, check the “repo” box; the token will be used to download code repositories from GitHub. Click “Generate token.” Copy the generated token password. After that, set the token in the environment variable in R. Since we are using conda, enter R by typing R in the terminal. Then, enter the command: usethis::edit_r_environ(). This will open a file. Press the i key to edit. Paste the token you copied into the code area as follows: GITHUB_TOKEN=“your_token”. Then press Esc, type :wq! (force save). After that, you need to exit Linux and re-enter R. Close and reopen the terminal to apply the environment variable. Reopen Linux, activate the conda environment, and enter R again. devtools::install_github("sqjin/CellChat") devtools::install_github("immunogenomics/presto") devtools::install_github("aertslab/SCENIC@fde9774") devtools::install_github("pzhulab/abcCellmap@f44c14b") devtools::install_github("navinlabcode/copykat@d7d6569") devtools::install_github('chris-mcginnis-ucsf/DoubletFinder@8c7f76e') devtools::install_github("mojaveazure/seurat-disk@877d4e1") Install HemaScopeR from github devtools::install_github(repo="ZhenyiWangTHU/HemaScopeR", dep = FALSE) Exist the R language environment quit() 2.5 Install required Python-packages Upgrade pip and set mirrors python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade pip pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple pip config set global.extra-index-url http://mirrors.aliyun.com/pypi/simple/ Install required packages pip install stereopy==1.3.1 anndata==0.9.2 arboreto==0.1.6 cell2location==0.1.3 commot==0.0.3 karateclub==1.2.2 matplotlib==3.7.1 networkx==3.1 numpy==1.23.5 pandas==1.5.3 phate==1.0.11 pot==0.9.1 scanpy==1.9.6 scipy==1.10.1 scvelo==0.3.2 scvi-tools==0.20.3 seaborn==0.12.2 distributed==2024.2.1 dask-expr==0.5.3 2.6 The installed packages with versions R packages with versions Package Version ------- ------- abcCellmap 0.1.0 abind 1.4-5 annotate 1.78.0 AnnotationDbi 1.64.1 ape 5.8 aplot 0.2.3 arrow 17.0.0 askpass 1.2.0 assertthat 0.2.1 AUCell 1.22.0 backports 1.5.0 base 4.3.3 base64enc 0.1-3 beachmat 2.16.0 BH 1.84.0-0 Biobase 2.60.0 BiocFileCache 2.8.0 BiocGenerics 0.46.0 BiocManager 1.30.23 BiocNeighbors 1.18.0 BiocParallel 1.34.2 BiocSingular 1.16.0 BiocVersion 3.18.1 biocViews 1.68.1 biomaRt 2.56.1 Biostrings 2.68.1 bit 4.0.5 bit64 4.0.5 bitops 1.0-7 blob 1.2.4 bluster 1.10.0 boot 1.3-30 brew 1.0-10 brio 1.1.5 broom 1.0.6 bslib 0.7.0 cachem 1.1.0 callr 3.7.6 car 3.1-2 carData 3.0-5 caret 6.0-94 caTools 1.18.2 CellChat 2.0.1 cellranger 1.1.0 circlize 0.4.16 class 7.3-22 cli 3.6.3 clipr 0.8.0 clock 0.7.0 clue 0.3-65 cluster 2.1.6 clusterProfiler 4.10.1 coda 0.19-4.1 codetools 0.2-20 colorspace 2.1-0 combinat 0.0-8 commonmark 1.9.1 compiler 4.3.3 ComplexHeatmap 2.18.0 conquer 1.3.3 copykat 1.1.0 corrplot 0.92 cowplot 1.1.3 cpp11 0.4.7 crayon 1.5.3 credentials 2.0.1 crosstalk 1.2.1 curl 5.2.1 data.table 1.15.4 datasets 4.3.3 DBI 1.2.3 dbplyr 2.5.0 DDRTree 0.1.5 DelayedArray 0.26.6 DelayedMatrixStats 1.22.1 deldir 2.0-4 Deriv 4.1.3 desc 1.4.3 devtools 2.4.5 diagram 1.6.5 diffobj 0.3.5 digest 0.6.36 dlm 1.1-6 doMC 1.3.8 doRNG 1.8.6 doBy 4.6.22 docopt 0.7.1 doParallel 1.0.17 DOSE 3.28.2 dotCall64 1.1-1 DoubletFinder 2.0.3 downlit 0.4.4 downloader 0.4 dplyr 1.1.4 dqrng 0.3.2 dynamicTreeCut 1.63-1 e1071 1.7-14 edgeR 3.42.4 ellipsis 0.3.2 enrichplot 1.22.0 evaluate 0.24.0 expm 0.999-9 fansi 1.0.6 farver 2.1.2 fastDummies 1.7.3 fastICA 1.2-4 fastmap 1.2.0 fastmatch 1.1-4 feather 0.3.5 fgsea 1.28.0 fields 16.2 filelock 1.0.3 fitdistrplus 1.1-11 FNN 1.1.4 fontawesome 0.5.2 forcats 1.0.0 foreach 1.5.2 foreign 0.8-87 formatR 1.14 fs 1.6.4 futile.logger 1.4.3 futile.options 1.0.1 future 1.33.2 future.apply 1.11.2 gelnet 1.2.1 generics 0.1.3 GENIE3 1.24.0 GenomeInfoDb 1.36.1 GenomeInfoDbData 1.2.11 GenomicRanges 1.52.0 gert 2.0.1 GetoptLong 1.0.5 ggalluvial 0.12.5 ggforce 0.4.2 ggfun 0.1.5 ggnetwork 0.5.13 ggnewscale 0.4.10 ggplot2 3.5.1 ggplotify 0.1.2 ggpubr 0.6.0 ggraph 2.2.1 ggrepel 0.9.5 ggridges 0.5.6 ggsci 3.2.0 ggsignif 0.6.4 ggtree 3.10.1 gh 1.4.1 gitcreds 0.1.2 GlobalOptions 0.1.2 globals 0.16.3 glue 1.7.0 GO.db 3.18.0 goftest 1.2-3 googleVis 0.7.3 GOSemSim 2.28.1 gower 1.0.1 gplots 3.1.3.1 graph 1.78.0 graphics 4.3.3 graphlayouts 1.1.1 grDevices 4.3.3 grid 4.3.3 gridBase 0.4-7 gridExtra 2.3 gridGraphics 0.5-1 GSEABase 1.62.0 gson 0.1.0 GSVA 1.48.2 gtable 0.3.5 gtools 3.9.5 hardhat 1.4.0 haven 2.5.4 HDF5Array 1.28.1 hdf5r 1.3.10 HDO.db 0.99.1 HemaScopeR 1.0.0 here 1.0.1 hexbin 1.28.3 highr 0.11 hms 1.1.3 HSMMSingleCell 1.20.0 htmltools 0.5.8.1 htmlwidgets 1.6.4 httpuv 1.6.15 httr 1.4.7 httr2 1.0.2 ica 1.0-3 igraph 2.0.3 ini 0.3.1 ipred 0.9-14 IRanges 2.34.1 irlba 2.3.5.1 isoband 0.2.7 iterators 1.0.14 jquerylib 0.1.4 jsonlite 1.8.8 kableExtra 1.3.4 KEGGREST 1.40.0 kernlab 0.9-32 KernSmooth 2.23-24 knitr 1.48 labeling 0.4.3 lambda.r 1.2.4 later 1.3.2 lattice 0.22-6 lava 1.7.3 lazyeval 0.2.2 leiden 0.4.3.1 leidenbase 0.1.27 lifecycle 1.0.4 limma 3.56.2 listenv 0.9.1 lme4 1.1-35.5 lmtest 0.9-40 locfit 1.5-9.9 lsei 1.3-0 lubridate 1.9.3 magrittr 2.0.3 maps 3.4.2 maptools 1.1-8 markdown 1.13 MASS 7.3-60.0.1 Matrix 1.6-5 MatrixGenerics 1.12.2 MatrixModels 0.5-3 matrixStats 1.3.0 mcmc 0.9-8 MCMCpack 1.7-0 memoise 2.0.1 metapod 1.8.0 methods 4.3.3 mgcv 1.9-1 microbenchmark 1.4.10 mime 0.12 miniUI 0.1.1.1 minqa 1.2.7 mixtools 2.0.0 ModelMetrics 1.2.2.2 modelr 0.1.11 monocle 2.28.0 munsell 0.5.1 network 1.18.2 nlme 3.1-165 nloptr 2.0.3 NMF 0.27 nnet 7.3-19 npsurv 0.5-0 numDeriv 2016.8-1.1 openssl 2.2.0 org.Hs.eg.db 3.17.0 org.Mm.eg.db 3.17.0 parallel 4.3.3 parallelDist 0.2.6 parallelly 1.37.1 patchwork 1.2.0 pbapply 1.7-2 pbkrtest 0.5.2 pcaMethods 1.92.0 phateR 1.0.7 pheatmap 1.0.12 pillar 1.9.0 pkgbuild 1.4.4 pkgconfig 2.0.3 pkgdown 2.1.0 pkgload 1.3.4 plogr 0.2.0 plotly 4.10.4 plyr 1.8.9 png 0.1-8 polyclip 1.10-6 polynom 1.4-1 praise 1.0.0 presto 1.0.0 prettyunits 1.2.0 princurve 2.1.6 pROC 1.18.5 processx 3.8.4 prodlim 2024.06.25 profvis 0.3.8 progress 1.2.3 progressr 0.14.0 promises 1.3.0 proxy 0.4-27 ps 1.7.7 purrr 1.0.2 qlcMatrix 0.9.8 quantreg 5.98 qvalue 2.34.0 R.methodsS3 1.8.2 R.oo 1.26.0 R.utils 2.12.3 R6 2.5.1 ragg 1.3.2 randomForest 4.7-1.1 RANN 2.6.1 rappdirs 0.3.3 RBGL 1.76.0 RcisTarget 1.20.0 rcmdcheck 1.4.0 RColorBrewer 1.1-3 Rcpp 1.0.13 RcppAnnoy 0.0.22 RcppArmadillo 14.0.0-1 RcppEigen 0.3.4.0.0 RcppGSL 0.3.13 RcppHNSW 0.6.0 RcppParallel 5.1.6 RcppProgress 0.4.2 RcppTOML 0.2.2 RcppZiggurat 0.1.6 RCurl 1.98-1.16 readr 2.1.5 readxl 1.4.3 recipes 1.1.0 registry 0.5-1 rematch 2.0.0 rematch2 2.1.2 remotes 2.5.0 reshape2 1.4.4 reticulate 1.38.0 Rfast 2.1.0 Rfast2 0.1.5.1 rhdf5 2.44.0 rhdf5filters 1.12.1 Rhdf5lib 1.22.0 rio 1.1.1 rjson 0.2.21 rlang 1.1.4 rmarkdown 2.27 rngtools 1.5.2 ROCR 1.0-11 roxygen2 7.3.2 rpart 4.1.23 rprojroot 2.0.4 RSpectra 0.16-2 RSQLite 2.3.7 rstatix 0.7.2 rstudioapi 0.16.0 rsvd 1.0.5 Rtsne 0.17 RUnit 0.4.33 rversions 2.1.2 rvest 1.0.4 S4Arrays 1.0.4 S4Vectors 0.38.1 sass 0.4.9 ScaledMatrix 1.8.1 scales 1.3.0 scattermore 1.2 scatterpie 0.2.3 SCENIC 1.3.0 scmap 1.24.0 scran 1.28.1 sctransform 0.4.1 scuttle 1.10.1 segmented 2.1-0 selectr 0.4-2 sessioninfo 1.2.2 Seurat 4.3.0.1 SeuratDisk 0.0.0.9021 SeuratObject 5.0.2 SeuratWrappers 0.3.1 shadowtext 0.1.4 shape 1.4.6.1 shinyjs 2.1.0 shiny 1.8.0 shinyWidgets 0.8.6 shinydashboard 0.7.2 slickR 0.6.0 SingleCellExperiment 1.22.0 sitmo 2.0.2 slam 0.1-51 slingshot 2.8.0 sna 2.7-2 snow 0.4-4 sourcetools 0.1.7-1 sp 2.1-4 spam 2.10-0 SparseM 1.84 sparseMatrixStats 1.12.2 sparsesvd 0.2-2 spatstat.data 3.1-2 spatstat.explore 3.2-6 spatstat.geom 3.2-9 spatstat.random 3.2-3 spatstat.sparse 3.1-0 spatstat.univar 3.0-0 spatstat.utils 3.0-5 splines 4.3.3 SQUAREM 2021.1 statmod 1.5.0 statnet.common 4.9.0 stats 4.3.3 stats4 4.3.3 stringi 1.8.4 stringr 1.5.1 SummarizedExperiment 1.30.2 survival 3.7-0 svglite 2.1.3 sys 3.4.2 systemfonts 1.1.0 tcltk 4.3.3 tensor 1.5 testthat 3.2.1.1 textshaping 0.3.7 tibble 3.2.1 tidygraph 1.3.1 tidyr 1.3.1 tidyselect 1.2.1 tidytree 0.4.6 timechange 0.3.0 timeDate 4032.109 tinytex 0.51 tools 4.3.3 TrajectoryUtils 1.8.0 transport 0.14-6 treeio 1.26.0 tweenr 2.0.3 tzdb 0.4.0 urlchecker 1.0.1 usethis 2.2.3 utf8 1.2.4 utils 4.3.3 uwot 0.1.16 vctrs 0.6.5 velocyto.R 0.6 VGAM 1.1-11 viridis 0.6.5 viridisLite 0.4.2 vroom 1.6.5 waldo 0.5.2 webshot 0.5.5 whisker 0.4.1 withr 3.0.0 writexl 1.5.0 xfun 0.46 XML 3.99-0.17 xml2 1.3.6 xopen 1.0.1 xtable 1.8-4 XVector 0.40.0 yaml 2.3.9 yulab.utils 0.1.4 zip 2.3.1 zlibbioc 1.46.0 zoo 1.8-12 Python packages with versions Package Version ------------------------ -------------- absl-py 2.1.0 access 1.1.9 affine 2.4.0 aiohttp 3.9.5 aiosignal 1.3.1 anndata 0.10.8 annotated-types 0.7.0 anyio 4.4.0 arboreto 0.1.6 argcomplete 3.4.0 array_api_compat 1.7.1 arrow 1.3.0 attrs 23.2.0 backoff 2.2.1 beautifulsoup4 4.12.3 blessed 1.20.0 bokeh 3.5.0 boto3 1.34.145 botocore 1.34.145 cell2location 0.1.3 certifi 2024.7.4 charset-normalizer 3.3.2 chex 0.1.7 click 8.1.7 click-plugins 1.1.1 cligj 0.7.2 cloudpickle 3.0.0 commot 0.0.3 contextlib2 21.6.0 contourpy 1.2.1 croniter 1.4.1 cycler 0.12.1 dask 2024.7.0 dask-expr 0.5.3 dateutils 0.6.12 decorator 4.4.2 deepdiff 7.0.1 Deprecated 1.2.14 deprecation 2.1.0 distributed 2024.2.1 dm-tree 0.1.8 dnspython 2.6.1 docrep 0.3.2 editor 1.6.6 email_validator 2.2.0 esda 2.4.3 etils 1.9.2 fastapi 0.111.1 fastapi-cli 0.0.4 filelock 3.15.4 fiona 1.9.6 flax 0.8.5 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.6.1 future 1.0.0 gensim 4.3.3 geopandas 0.13.2 giddy 2.3.5 graphtools 1.5.3 h11 0.14.0 h5py 3.11.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 idna 3.7 igraph 0.11.6 importlib_metadata 8.0.0 importlib_resources 6.4.0 inequality 1.0.0 inquirer 3.3.0 itsdangerous 2.2.0 jax 0.4.30 jaxlib 0.4.30 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 karateclub 1.2.2 kiwisolver 1.4.5 legacy-api-wrap 1.4 leidenalg 0.10.2 Levenshtein 0.25.1 libpysal 4.7.0 lightning 2.0.9.post0 lightning-cloud 0.5.70 lightning-utilities 0.11.5 llvmlite 0.43.0 locket 1.0.0 loompy 3.0.7 lz4 4.3.3 mapclassify 2.6.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.1 mdurl 0.1.2 mgwr 2.2.1 ml_collections 0.1.1 ml-dtypes 0.4.0 momepy 0.6.0 mpmath 1.3.0 msgpack 1.0.8 mudata 0.2.4 multidict 6.0.5 multipledispatch 1.0.0 natsort 8.4.0 nest-asyncio 1.6.0 networkx 3.3 numba 0.60.0 numpy 1.26.4 numpy-groupies 0.11.1 numpyro 0.15.1 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 opencv-python 4.10.0.84 opt-einsum 3.3.0 optax 0.2.1 orbax-checkpoint 0.5.21 ordered-set 4.1.0 packaging 24.1 pandas 2.0.3 partd 1.4.2 patsy 0.5.6 phate 1.0.11 pillow 10.4.0 pip 24.1.2 platformdirs 4.2.2 plotly 5.22.0 pointpats 2.4.0 POT 0.9.4 protobuf 5.27.2 psutil 6.0.0 PuLP 2.9.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pydantic 2.1.1 pydantic_core 2.4.0 Pygments 2.18.0 PyGSP 0.5.1 PyJWT 2.8.0 pynndescent 0.5.13 pyparsing 3.0.9 pyproj 3.6.1 pyro-api 0.1.2 pyro-ppl 1.9.1 pysal 24.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-igraph 0.11.6 python-Levenshtein 0.25.1 python-louvain 0.16 python-multipart 0.0.9 pytorch-lightning 2.3.3 pytz 2024.1 PyYAML 6.0.1 quantecon 0.7.2 rapidfuzz 3.9.4 rasterio 1.3.10 rasterstats 0.19.0 readchar 4.1.0 requests 2.32.3 rich 13.7.1 Rtree 1.3.0 runs 1.2.2 s_gd2 1.8.1 s3transfer 0.10.2 scanpy 1.10.2 scikit-learn 1.5.1 scipy 1.13.1 scprep 1.2.3 scvelo 0.3.2 scvi-tools 1.1.5 seaborn 0.13.2 segregation 2.5 session_info 1.0.0 setuptools 71.0.1 shapely 2.0.5 shellingham 1.5.4 simplejson 3.19.2 six 1.16.0 smart-open 7.0.4 sniffio 1.3.1 snuggs 1.4.7 sortedcontainers 2.4.0 soupsieve 2.5 spaghetti 1.7.4 sparse 0.15.4 spglm 1.0.8 spint 1.0.7 splot 1.1.5.post1 spopt 0.5.0 spreg 1.4 spvcm 0.3.0 starlette 0.37.2 starsessions 1.3.0 statsmodels 0.14.1 stdlib-list 0.10.0 sympy 1.13.1 tasklogger 1.2.0 tblib 3.0.0 tenacity 8.5.0 tensorstore 0.1.63 texttable 1.7.0 threadpoolctl 3.5.0 tobler 0.11.2 toml 0.10.2 tomlkit 0.13.0 toolz 0.12.1 torch 2.3.1 torchmetrics 1.4.0.post0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 triton 2.3.1 typer 0.12.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 umap-learn 0.5.6 urllib3 2.2.2 uvicorn 0.30.1 uvloop 0.19.0 watchfiles 0.22.0 wcwidth 0.2.13 websocket-client 1.8.0 websockets 12.0 wheel 0.43.0 wrapt 1.16.0 xarray 2024.6.0 xmltodict 0.13.0 xmod 1.8.1 xyzservices 2024.6.0 yarl 1.9.4 yq 3.4.3 zict 3.0.0 zipp 3.19.2 "],["integrated-scrna-seq-pipeline.html", "3 Integrated scRNA-seq pipeline", " 3 Integrated scRNA-seq pipeline Load the R packages. # sc libraries library(Seurat) library(phateR) library(DoubletFinder) library(monocle) library(slingshot) library(URD) library(GSVA) library(limma) library(plyr) library(dplyr) library(org.Mm.eg.db) library(org.Hs.eg.db) library(CellChat) library(velocyto.R) library(SeuratWrappers) library(stringr) library(scran) library(ggpubr) library(viridis) library(pheatmap) library(parallel) library(reticulate) library(SCENIC) library(feather) library(AUCell) library(RcisTarget) library(Matrix) library(foreach) library(doParallel) library(clusterProfiler) library(OpenXGR) # st libraries library(RColorBrewer) library(Rfast2) library(SeuratDisk) library(abcCellmap) library(biomaRt) library(copykat) library(gelnet) library(ggplot2) library(parallelDist) library(patchwork) library(markdown) # getpot library(getopt) library(tools) # HemaScopeR library(HemaScopeR) Run the integrated scRNA-seq pipeline. scRNASeq_10x_pipeline( # input and output input.data.dirs = c('./SRR7881399/outs/filtered_feature_bc_matrix', './SRR7881400/outs/filtered_feature_bc_matrix', './SRR7881401/outs/filtered_feature_bc_matrix', './SRR7881402/outs/filtered_feature_bc_matrix', './SRR7881403/outs/filtered_feature_bc_matrix', './SRR7881404/outs/filtered_feature_bc_matrix', './SRR7881405/outs/filtered_feature_bc_matrix', './SRR7881406/outs/filtered_feature_bc_matrix', './SRR7881407/outs/filtered_feature_bc_matrix', './SRR7881408/outs/filtered_feature_bc_matrix', './SRR7881409/outs/filtered_feature_bc_matrix', './SRR7881410/outs/filtered_feature_bc_matrix', './SRR7881411/outs/filtered_feature_bc_matrix', './SRR7881412/outs/filtered_feature_bc_matrix', './SRR7881413/outs/filtered_feature_bc_matrix', './SRR7881414/outs/filtered_feature_bc_matrix', './SRR7881415/outs/filtered_feature_bc_matrix', './SRR7881416/outs/filtered_feature_bc_matrix', './SRR7881417/outs/filtered_feature_bc_matrix', './SRR7881418/outs/filtered_feature_bc_matrix', './SRR7881419/outs/filtered_feature_bc_matrix', './SRR7881420/outs/filtered_feature_bc_matrix', './SRR7881421/outs/filtered_feature_bc_matrix', './SRR7881422/outs/filtered_feature_bc_matrix', './SRR7881423/outs/filtered_feature_bc_matrix'), project.names = c( 'SRR7881399', 'SRR7881400', 'SRR7881401', 'SRR7881402', 'SRR7881403', 'SRR7881404', 'SRR7881405', 'SRR7881406', 'SRR7881407', 'SRR7881408', 'SRR7881409', 'SRR7881410', 'SRR7881411', 'SRR7881412', 'SRR7881413', 'SRR7881414', 'SRR7881415', 'SRR7881416', 'SRR7881417', 'SRR7881418', 'SRR7881419', 'SRR7881420', 'SRR7881421', 'SRR7881422', 'SRR7881423'), output.dir = './output/', pythonPath = '/home/anaconda3/envs/HemaScopeR/bin/python', # quality control and preprocessing gene.column = 2, min.cells = 10, min.feature = 200, mt.pattern = '^MT-', nFeature_RNA.limit = 200, percent.mt.limit = 20, scale.factor = 10000, nfeatures = 3000, ndims = 50, vars.to.regress = NULL, PCs = 1:35, resolution = 0.4, n.neighbors = 50, # remove doublets doublet.percentage = 0.04, doublerFinderwraper.PCs = 1:20, doublerFinderwraper.pN = 0.25, doublerFinderwraper.pK = 0.1, # phateR phate.knn = 50, phate.npca = 20, phate.t = 10, phate.ndim = 2, min.pct = 0.25, logfc.threshold = 0.25, # visualization ViolinPlot.cellTypeOrders = as.character(1:22), ViolinPlot.cellTypeColors = NULL, Org = 'hsa', loom.files.path = c( './SRR7881399/velocyto/SRR7881399.loom', './SRR7881400/velocyto/SRR7881400.loom', './SRR7881401/velocyto/SRR7881401.loom', './SRR7881402/velocyto/SRR7881402.loom', './SRR7881403/velocyto/SRR7881403.loom', './SRR7881404/velocyto/SRR7881404.loom', './SRR7881405/velocyto/SRR7881405.loom', './SRR7881406/velocyto/SRR7881406.loom', './SRR7881407/velocyto/SRR7881407.loom', './SRR7881408/velocyto/SRR7881408.loom', './SRR7881409/velocyto/SRR7881409.loom', './SRR7881410/velocyto/SRR7881410.loom', './SRR7881411/velocyto/SRR7881411.loom', './SRR7881412/velocyto/SRR7881412.loom', './SRR7881413/velocyto/SRR7881413.loom', './SRR7881414/velocyto/SRR7881414.loom', './SRR7881415/velocyto/SRR7881415.loom', './SRR7881416/velocyto/SRR7881416.loom', './SRR7881417/velocyto/SRR7881417.loom', './SRR7881418/velocyto/SRR7881418.loom', './SRR7881419/velocyto/SRR7881419.loom', './SRR7881420/velocyto/SRR7881420.loom', './SRR7881421/velocyto/SRR7881421.loom', './SRR7881422/velocyto/SRR7881422.loom', './SRR7881423/velocyto/SRR7881423.loom'), # cell cycle cellcycleCutoff = NULL, # cell chat sorting = FALSE, ncores = 10, # Verbose = FALSE, # activeEachStep Whether_load_previous_results = FALSE, Step1_Input_Data = TRUE, Step1_Input_Data.type = 'cellranger-count', Step2_Quality_Control = TRUE, Step2_Quality_Control.RemoveBatches = TRUE, Step2_Quality_Control.RemoveDoublets = TRUE, Step3_Clustering = TRUE, Step4_Identify_Cell_Types = TRUE, Step4_Use_Which_Labels = 'clustering', Step4_Cluster_Labels = NULL, Step4_Changed_Labels = NULL, Step4_run_sc_CNV = TRUE, Step5_Visualization = TRUE, Step6_Find_DEGs = TRUE, Step7_Assign_Cell_Cycle = TRUE, Step8_Calculate_Heterogeneity = TRUE, Step9_Violin_Plot_for_Marker_Genes = TRUE, Step10_Calculate_Lineage_Scores = TRUE, Step11_GSVA = TRUE, Step11_GSVA.identify.cellType.features=TRUE, Step11_GSVA.identify.diff.features=FALSE, Step11_GSVA.comparison.design=NULL, Step12_Construct_Trajectories = TRUE, Step12_Construct_Trajectories.clusters = c('3','6','9','10','11','14','15','19'), Step12_Construct_Trajectories.monocle = TRUE, Step12_Construct_Trajectories.slingshot = TRUE, Step12_Construct_Trajectories.scVelo = TRUE, Step13_TF_Analysis = TRUE, Step14_Cell_Cell_Interaction = TRUE, Step15_Generate_the_Report = TRUE ) "],["step-by-step-scrna-seq-pipeline.html", "4 Step-by-step scRNA-seq Pipeline 4.1 Step 1. Load the R packages and the input data 4.2 Step 2. Quality Control 4.3 Step 3. Clustering 4.4 Step 4. Identify Cell Types 4.5 Step 5. Visualization 4.6 Step 6. Find DEGs 4.7 Step 7. Assign Cell Cycles 4.8 Step 8. Calculate Heterogeneity 4.9 Step 9. Violin Plot for Marker Genes 4.10 Step 10. Calculate Lineage Scores 4.11 Step 11. GSVA 4.12 Step 12. Construct Trajectories 4.13 Step 13. TF Analysis 4.14 Step 14. Cell-Cell Interaction", " 4 Step-by-step scRNA-seq Pipeline 4.1 Step 1. Load the R packages and the input data Load the R packages. # sc libraries library(Seurat) library(phateR) library(DoubletFinder) library(monocle) library(slingshot) library(URD) library(GSVA) library(limma) library(plyr) library(dplyr) library(org.Mm.eg.db) library(org.Hs.eg.db) library(CellChat) library(velocyto.R) library(SeuratWrappers) library(stringr) library(scran) library(ggpubr) library(viridis) library(pheatmap) library(parallel) library(reticulate) library(SCENIC) library(feather) library(AUCell) library(RcisTarget) library(Matrix) library(foreach) library(doParallel) library(clusterProfiler) library(OpenXGR) # st libraries library(RColorBrewer) library(Rfast2) library(SeuratDisk) library(abcCellmap) library(biomaRt) library(copykat) library(gelnet) library(ggplot2) library(parallelDist) library(patchwork) library(markdown) # getpot library(getopt) library(tools) # HemaScopeR library(HemaScopeR) Set the paths for the input data, the output results, and the Python installation. input.data.dirs = c('./SRR7881399/outs/filtered_feature_bc_matrix', './SRR7881400/outs/filtered_feature_bc_matrix', './SRR7881401/outs/filtered_feature_bc_matrix', './SRR7881402/outs/filtered_feature_bc_matrix', './SRR7881403/outs/filtered_feature_bc_matrix', './SRR7881404/outs/filtered_feature_bc_matrix', './SRR7881405/outs/filtered_feature_bc_matrix', './SRR7881406/outs/filtered_feature_bc_matrix', './SRR7881407/outs/filtered_feature_bc_matrix', './SRR7881408/outs/filtered_feature_bc_matrix', './SRR7881409/outs/filtered_feature_bc_matrix', './SRR7881410/outs/filtered_feature_bc_matrix', './SRR7881411/outs/filtered_feature_bc_matrix', './SRR7881412/outs/filtered_feature_bc_matrix', './SRR7881413/outs/filtered_feature_bc_matrix', './SRR7881414/outs/filtered_feature_bc_matrix', './SRR7881415/outs/filtered_feature_bc_matrix', './SRR7881416/outs/filtered_feature_bc_matrix', './SRR7881417/outs/filtered_feature_bc_matrix', './SRR7881418/outs/filtered_feature_bc_matrix', './SRR7881419/outs/filtered_feature_bc_matrix', './SRR7881420/outs/filtered_feature_bc_matrix', './SRR7881421/outs/filtered_feature_bc_matrix', './SRR7881422/outs/filtered_feature_bc_matrix', './SRR7881423/outs/filtered_feature_bc_matrix') output.dir = './output/' pythonPath = '/home/anaconda3/envs/HemaScopeR/bin/python' Set the parameters for loading the data sets. project.names = c('SRR7881399', 'SRR7881400', 'SRR7881401', 'SRR7881402', 'SRR7881403', 'SRR7881404', 'SRR7881405', 'SRR7881406', 'SRR7881407', 'SRR7881408', 'SRR7881409', 'SRR7881410', 'SRR7881411', 'SRR7881412', 'SRR7881413', 'SRR7881414', 'SRR7881415', 'SRR7881416', 'SRR7881417', 'SRR7881418', 'SRR7881419', 'SRR7881420', 'SRR7881421', 'SRR7881422', 'SRR7881423') gene.column = 2 min.cells = 10 min.feature = 200 mt.pattern = '^MT-' Step1_Input_Data.type = 'cellranger-count' Create folders for saving the results of HemaScopeR analysis. wdir <- getwd() if(is.null(pythonPath)==FALSE){ reticulate::use_python(pythonPath) }else{print('Please set the path of Python.')} if (!file.exists(paste0(output.dir, '/HemaScopeR_results/'))) { dir.create(paste0(output.dir, '/HemaScopeR_results/')) } output.dir <- paste0(output.dir,'/HemaScopeR_results/') if (!file.exists(paste0(output.dir, '/RDSfiles/'))) { dir.create(paste0(output.dir, '/RDSfiles/')) } previous_results_path <- paste0(output.dir, '/RDSfiles/') # if (Whether_load_previous_results) { # print('Loading the previous results...') # Load_previous_results(previous_results_path = previous_results_path) # } # Step1. Input data----------------------------------------------------------------------------- print('Step1. Input data.') if (!file.exists(paste0(output.dir, '/Step1.Input_data/'))) { dir.create(paste0(output.dir, '/Step1.Input_data/')) } Load the data sets. file.copy(from = input.data.dirs, to = paste0(output.dir,'/Step1.Input_data/'), recursive = TRUE) if(Step1_Input_Data.type == 'cellranger-count'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_data.temp <- Read10X(data.dir = input.data.dirs[i], gene.column = gene.column) sc_object.temp <- CreateSeuratObject(counts = sc_data.temp, project = project.names[i], min.cells = min.cells, min.feature = min.feature) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp)} }else{ sc_data <- Read10X(data.dir = input.data.dirs, gene.column = gene.column) sc_object <- CreateSeuratObject(counts = sc_data, project = project.names, min.cells = min.cells, min.feature = min.feature) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else if(Step1_Input_Data.type == 'Seurat'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_object.temp <- readRDS(input.data.dirs[i]) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp) } }else{ sc_object <- readRDS(input.data.dirs) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else if(Step1_Input_Data.type == 'Matrix'){ if(length(input.data.dirs) > 1){ input.data.list <- c() for (i in 1:length(input.data.dirs)) { sc_data.temp <- readRDS(input.data.dirs[i]) sc_object.temp <- CreateSeuratObject(counts = sc_data.temp, project = project.names[i], min.cells = min.cells, min.feature = min.feature) sc_object.temp[["percent.mt"]] <- PercentageFeatureSet(sc_object.temp, pattern = mt.pattern) input.data.list <- c(input.data.list, sc_object.temp)} }else{ sc_data <- readRDS(input.data.dirs) sc_object <- CreateSeuratObject(counts = sc_data, project = project.names, min.cells = min.cells, min.feature = min.feature) sc_object[["percent.mt"]] <- PercentageFeatureSet(sc_object, pattern = mt.pattern) } }else{ stop('Please input data generated by the cellranger-count software, or a Seurat object, or a gene expression matrix. HemaScopeR does not support other formats of input data.') } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.2 Step 2. Quality Control Set the parameters for quality control. # quality control and preprocessing nFeature_RNA.limit = 200 percent.mt.limit = 20 scale.factor = 10000 nfeatures = 3000 ndims = 50 vars.to.regress = NULL PCs = 1:35 resolution = 0.4 n.neighbors = 50 # remove doublets doublet.percentage = 0.04 doublerFinderwraper.PCs = 1:20 doublerFinderwraper.pN = 0.25 doublerFinderwraper.pK = 0.1 Step2_Quality_Control.RemoveBatches = TRUE Step2_Quality_Control.RemoveDoublets = TRUE Create a folder for saving the results of quality control. print('Step2. Quality control.') if (!file.exists(paste0(output.dir, '/Step2.Quality_control/'))) { dir.create(paste0(output.dir, '/Step2.Quality_control/')) } Run the quality control process. if(length(input.data.dirs) > 1){ # preprocess and quality control for multiple scRNA-Seq data sets sc_object <- QC_multiple_scRNASeq(seuratObjects = input.data.list, datasetID = project.names, output.dir = paste0(output.dir,'/Step2.Quality_control/'), Step2_Quality_Control.RemoveBatches = Step2_Quality_Control.RemoveBatches, Step2_Quality_Control.RemoveDoublets = Step2_Quality_Control.RemoveDoublets, nFeature_RNA.limit = nFeature_RNA.limit, percent.mt.limit = percent.mt.limit, scale.factor = scale.factor, nfeatures = nfeatures, ndims = ndims, vars.to.regress = vars.to.regress, PCs = PCs, resolution = resolution, n.neighbors = n.neighbors, percentage = doublet.percentage, doublerFinderwraper.PCs = doublerFinderwraper.PCs, doublerFinderwraper.pN = doublerFinderwraper.pN, doublerFinderwraper.pK = doublerFinderwraper.pK ) }else{ # preprocess and quality control for single scRNA-Seq data set sc_object <- QC_single_scRNASeq(sc_object = sc_object, datasetID = project.names, output.dir = paste0(output.dir,'/Step2.Quality_control/'), Step2_Quality_Control.RemoveDoublets = Step2_Quality_Control.RemoveDoublets, nFeature_RNA.limit = nFeature_RNA.limit, percent.mt.limit = percent.mt.limit, scale.factor = scale.factor, nfeatures = nfeatures, vars.to.regress = vars.to.regress, ndims = ndims, PCs = PCs, resolution = resolution, n.neighbors = n.neighbors, percentage = doublet.percentage, doublerFinderwraper.PCs = doublerFinderwraper.PCs, doublerFinderwraper.pN = doublerFinderwraper.pN, doublerFinderwraper.pK = doublerFinderwraper.pK) } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.3 Step 3. Clustering Set the parameters for clustering. PCs = 1:35 resolution = 0.4 n.neighbors = 50 Create a folder for saving the results of Louvain clustering. print('Step3. Clustering.') if (!file.exists(paste0(output.dir, '/Step3.Clustering/'))) { dir.create(paste0(output.dir, '/Step3.Clustering/')) } Run Louvian clustering. if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){graph.name <- 'integrated_snn'}else{graph.name <- 'RNA_snn'} sc_object <- FindNeighbors(sc_object, dims = PCs, k.param = n.neighbors, force.recalc = TRUE) sc_object <- FindClusters(sc_object, resolution = resolution, graph.name = graph.name) sc_object@meta.data$seurat_clusters <- as.character(as.numeric(sc_object@meta.data$seurat_clusters)) # plot clustering pdf(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','tsne_cluster.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "tsne", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() pdf(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','umap_cluster.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','tsne_cluster.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "tsne", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step3.Clustering/'), '/sc_object ','umap_cluster.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "umap", group.by = "seurat_clusters", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.4 Step 4. Identify Cell Types Set the path for the database. databasePath = "~/HemaScopeR/database/" Set the parameters for cell type identification. Step4_Use_Which_Labels = 'clustering' Step4_Cluster_Labels = NULL Step4_Changed_Labels = NULL Org = 'hsa' ncores = 10 Create a folder for saving the results of cell type identification. print('Step4. Identify cell types automatically.') if (!file.exists(paste0(output.dir, '/Step4.Identify_Cell_Types/'))) { dir.create(paste0(output.dir, '/Step4.Identify_Cell_Types/')) } Run the cell type identification process and the copy number variation (CNV) analysis. sc_object <- run_cell_annotation(object = sc_object, assay = 'RNA', species = Org, output.dir = paste0(output.dir,'/Step4.Identify_Cell_Types/')) if(Org == 'hsa'){ load(paste0(databasePath,"/HematoMap.reference.rdata")) if(length(intersect(rownames(HematoMap.reference), rownames(sc_object))) < 1000){ HematoMap.reference <- RenameGenesSeurat(obj = HematoMap.reference, newnames = toupper(rownames(HematoMap.reference)), gene.use = rownames(HematoMap.reference), de.assay = "RNA", lassays = "RNA") } if(sc_object@active.assay == 'integrated'){ DefaultAssay(sc_object) <- 'RNA' sc_object <- mapDataToRef(ref_object = HematoMap.reference, ref_labels = HematoMap.reference@meta.data$CellType, query_object = sc_object, PCs = PCs, output.dir = paste0(output.dir, '/Step4.Identify_Cell_Types/')) DefaultAssay(sc_object) <- 'integrated' }else{ sc_object <- mapDataToRef(ref_object = HematoMap.reference, ref_labels = HematoMap.reference@meta.data$CellType, query_object = sc_object, PCs = PCs, output.dir = paste0(output.dir, '/Step4.Identify_Cell_Types/')) } } Set the cell labels. # set the cell labels if(Step4_Use_Which_Labels == 'clustering'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$seurat_clusters Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.1'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$Seurat.RNACluster Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.2'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$scmap.RNACluster Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.3'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$Seurat.Immunophenotype Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'abcCellmap.4'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$scmap.Immunophenotype Idents(sc_object) <- sc_object@meta.data$selectLabels }else if(Step4_Use_Which_Labels == 'HematoMap'){ if(Org == 'hsa'){ sc_object@meta.data$selectLabels <- sc_object@meta.data$predicted.id Idents(sc_object) <- sc_object@meta.data$selectLabels }else{print("'HematoMap' is only applicable to human data ('Org' = 'hsa').")} }else if(Step4_Use_Which_Labels == 'changeLabels'){ if (!is.null(Step4_Cluster_Labels) && !is.null(Step4_Changed_Labels) && length(Step4_Cluster_Labels) == length(Step4_Changed_Labels)){ sc_object@meta.data$selectLabels <- plyr::mapvalues(sc_object@meta.data$seurat_clusters, from = as.character(Step4_Cluster_Labels), to = as.character(Step4_Changed_Labels), warn_missing = FALSE) Idents(sc_object) <- sc_object@meta.data$selectLabels }else{ print("Please input the 'Step4_Cluster_Labels' parameter as Seurat clustering labels, and the 'Step4_Changed_Labels' parameter as new labels. Please note that these two parameters should be of equal length.") } }else{ print('Please set the "Step4_Use_Which_Labels" parameter as "clustering", "abcCellmap.1", "abcCellmap.2", "HematoMap" or "changeLabels".') } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } Run the CNV analysis. sc_CNV(sc_object=sc_object, save_path=paste0(output.dir,'/Step4.Identify_Cell_Types/'), assay = 'RNA', LOW.DR = 0.05, UP.DR = 0.1, win.size = 25, distance = "euclidean", genome = NULL, n.cores = ncores, species = Org) 4.5 Step 5. Visualization Create a folder for saving the visualization results. print('Step5. Visualization.') if (!file.exists(paste0(output.dir, '/Step5.Visualization/'))) { dir.create(paste0(output.dir, '/Step5.Visualization/')) } The statistical results for the numbers and proportions of cell groups. # statistical results cells_labels <- as.data.frame(cbind(rownames(sc_object@meta.data), as.character(sc_object@meta.data$selectLabels))) colnames(cells_labels) <- c('cell_id', 'cluster_id') cluster_counts <- cells_labels %>% group_by(cluster_id) %>% summarise(count = n()) total_cells <- nrow(cells_labels) cluster_counts <- cluster_counts %>% mutate(proportion = count / total_cells) cluster_counts <- as.data.frame(cluster_counts) cluster_counts$percentages <- scales::percent(cluster_counts$proportion, accuracy = 0.1) cluster_counts <- cluster_counts[,-which(colnames(cluster_counts)=='proportion')] cluster_counts$cluster_id_count_percentages <- paste(cluster_counts$cluster_id, " (", cluster_counts$count, ' cells; ', cluster_counts$percentages, ")", sep='') cluster_counts <- cluster_counts[order(cluster_counts$count, decreasing = TRUE),] cluster_counts <- rbind(cluster_counts, c('Total', sum(cluster_counts$count), '100%', 'all cells')) sc_object@meta.data$cluster_id_count_percentages <- mapvalues(sc_object@meta.data$selectLabels, from=cluster_counts$cluster_id, to=cluster_counts$cluster_id_count_percentages, warn_missing=FALSE) colnames(sc_object@meta.data)[which(colnames(sc_object@meta.data) == 'cluster_id_count_percentages')] <- paste('Total ', nrow(sc_object@meta.data), ' cells', sep='') cluster_counts <- cluster_counts[,-which(colnames(cluster_counts)=='cluster_id_count_percentages')] colnames(cluster_counts) <- c('Cell types', 'Cell counts', 'Percentages') # names(colorvector) <- mapvalues(names(colorvector), # from=cluster_counts$cluster_id, # to=cluster_counts$cluster_id_count_percentages, # warn_missing=FALSE) write.csv(cluster_counts, file=paste(paste0(output.dir, '/Step5.Visualization/'), '/cell types_cell counts_percentages.csv', sep=''), quote=FALSE, row.names=FALSE) The UMAP visualization. pdf(paste(paste0(output.dir, '/Step5.Visualization/'), '/cell types_cell counts_percentages_umap.pdf', sep=''), width = 14, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = paste('Total ', nrow(sc_object@meta.data), ' cells', sep=''), label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Set the parameters for phateR. phate.knn = 50 phate.npca = 20 phate.t = 10 phate.ndim = 2 Run phateR for dimensional reduction and visualization. # run phateR if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object) <- 'integrated' }else{ DefaultAssay(sc_object) <- 'RNA'} if(!is.null(pythonPath)){ run_phateR(sc_object = sc_object, output.dir = paste0(output.dir,'/Step5.Visualization/'), pythonPath = pythonPath, phate.knn = phate.knn, phate.npca = phate.npca, phate.t = phate.t, phate.ndim = phate.ndim) } Perform visualization using UMAP and TSNE. # plot cell types pdf(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','tsne cell types.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "tsne", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() pdf(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','umap cell types.pdf'), width = 6, height = 6) print(DimPlot(sc_object, reduction = "umap", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','tsne cell types.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "tsne", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() png(paste0(paste0(output.dir,'/Step5.Visualization/'), '/sc_object ','umap cell types.png'), width = 600, height = 600) print(DimPlot(sc_object, reduction = "umap", group.by = "ident", label = FALSE, pt.size = 0.1, raster = FALSE)) dev.off() Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.6 Step 6. Find DEGs Set the parameters for identifying differentially expressed genes. min.pct = 0.25 logfc.threshold = 0.25 Create a folder for the DEGs analysis. print('Step6. Find DEGs.') if (!file.exists(paste0(output.dir, '/Step6.Find_DEGs/'))) { dir.create(paste0(output.dir, '/Step6.Find_DEGs/')) } Identify DEGs using Wilcoxon Rank-Sum Test. sc_object.markers <- FindAllMarkers(sc_object, only.pos = TRUE, min.pct = min.pct, logfc.threshold = logfc.threshold) write.csv(sc_object.markers, file = paste0(paste0(output.dir, '/Step6.Find_DEGs/'),'sc_object.markerGenes.csv'), quote=FALSE) Set the parameters for GPTCelltype. your_openai_API_key = '' tissuename = 'human bone marrow' gptmodel = 'gpt-3.5' Use GPTCelltype to assist cell type annotation. GPT_annotation( marker.genes = sc_object.markers, your_openai_API_key = your_openai_API_key, tissuename = tissuename, gptmodel = gptmodel, output.dir = paste0(output.dir, '/Step6.Find_DEGs/')) Perform GO and KEGG enrichment. # GO enrichment if(Org=='mmu'){ OrgDb <- 'org.Mm.eg.db' }else if(Org=='hsa'){ OrgDb <- 'org.Hs.eg.db' }else{ stop("Org should be 'mmu' or 'hsa'.") } HemaScopeREnrichment(DEGs=sc_object.markers, OrgDb=OrgDb, output.dir=paste0(output.dir, '/Step6.Find_DEGs/')) sc_object.markers.top5 <- sc_object.markers %>% group_by(cluster) %>% top_n(n = 5, wt = avg_log2FC) pdf(paste0(paste0(output.dir, '/Step6.Find_DEGs/'), 'sc_object_markerGenesTop5.pdf'), width = 0.5*length(unique(sc_object.markers.top5$gene)), height = 0.5*length(unique(Idents(sc_object)))) print(DotPlot(sc_object, features = unique(sc_object.markers.top5$gene), cols=c("lightgrey",'red'))+theme(axis.text.x =element_text(angle = 45, vjust = 1, hjust = 1))) dev.off() png(paste0(paste0(output.dir, '/Step6.Find_DEGs/'), 'sc_object_markerGenesTop5.png'), width = 20*length(unique(sc_object.markers.top5$gene)), height = 30*length(unique(Idents(sc_object)))) print(DotPlot(sc_object, features = unique(sc_object.markers.top5$gene), cols=c("lightgrey",'red'))+theme(axis.text.x =element_text(angle = 45, vjust = 1, hjust = 1))) dev.off() Create a folder for saving the results of gene network analysis. if (!file.exists(paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/'))) { dir.create(paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/')) } Perform gene network analysis. OpenXGR_SAG(sc_object.markers = sc_object.markers, output.dir = paste0(output.dir, '/Step6.Find_DEGs/OpenXGR/'), subnet.size = 10) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.7 Step 7. Assign Cell Cycles Create a folder for saving the results of cell cycle analysis. print('Step7. Assign cell cycles.') if (!file.exists(paste0(output.dir, '/Step7.Assign_cell_cycles/'))) { dir.create(paste0(output.dir, '/Step7.Assign_cell_cycles/')) } Set the parameters for the cell cycle analysis. cellcycleCutoff = NULL Run the cell cycle analysis. datasets.before.batch.removal <- readRDS(paste0(paste0(output.dir, '/RDSfiles/'),'datasets.before.batch.removal.rds')) sc_object <- cellCycle(sc_object=sc_object, counts_matrix = GetAssayData(object = datasets.before.batch.removal, slot = "counts")%>%as.matrix(), data_matrix = GetAssayData(object = datasets.before.batch.removal, slot = "data")%>%as.matrix(), cellcycleCutoff = cellcycleCutoff, cellTypeOrders = unique(sc_object@meta.data$selectLabels), output.dir=paste0(output.dir, '/Step7.Assign_cell_cycles/'), databasePath = databasePath, Org = Org) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.8 Step 8. Calculate Heterogeneity Create a folder for saving the results of heterogeneity calculation. print('Step8. Calculate heterogeneity.') if (!file.exists(paste0(output.dir, '/Step8.Calculate_heterogeneity/'))) { dir.create(paste0(output.dir, '/Step8.Calculate_heterogeneity/')) } Run heterogeneity calculation process. expression_matrix <- GetAssayData(object = datasets.before.batch.removal, slot = "data")%>%as.matrix() expression_matrix <- expression_matrix[,rownames(sc_object@meta.data)] cell_types_groups <- as.data.frame(cbind(sc_object@meta.data$selectLabels, sc_object@meta.data$datasetID)) colnames(cell_types_groups) <- c('clusters', 'datasetID') if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } heterogeneity(expression_matrix = expression_matrix, cell_types_groups = cell_types_groups, cellTypeOrders = cellTypes_orders, output.dir = paste0(output.dir, '/Step8.Calculate_heterogeneity/')) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.9 Step 9. Violin Plot for Marker Genes Create a folder for saving the violin plots of marker genes. print('Step9. Violin plot for marker genes.') if (!file.exists(paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/'))) { dir.create(paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/')) } Run violin plot visualization. if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object) <- 'integrated' }else{ DefaultAssay(sc_object) <- 'RNA'} dataMatrix <- GetAssayData(object = sc_object, slot = "scale.data") if(is.null(marker.genes)&(Org == 'mmu')){ # mpp genes are from 'The bone marrow microenvironment at single cell resolution' # the other genes are from 'single cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis' # the aliases of these genes were changed in gecodeM16:Gpr64 -> Adgrg2, Sdpr -> Cavin2, Hbb-b1 -> Hbb-bs, Sfpi1 -> Spi1 HSC_lineage_signatures <- c('Slamf1', 'Itga2b', 'Kit', 'Ly6a', 'Bmi1', 'Gata2', 'Hlf', 'Meis1', 'Mpl', 'Mcl1', 'Gfi1', 'Gfi1b', 'Hoxb5') Mpp_genes <- c('Mki67', 'Mpo', 'Elane', 'Ctsg', 'Calr') Erythroid_lineage_signatures <- c('Klf1', 'Gata1', 'Mpl', 'Epor', 'Vwf', 'Zfpm1', 'Fhl1', 'Adgrg2', 'Cavin2','Gypa', 'Tfrc', 'Hbb-bs', 'Hbb-y') Lymphoid_lineage_signatures <- c('Tcf3', 'Ikzf1', 'Notch1', 'Flt3', 'Dntt', 'Btg2', 'Tcf7', 'Rag1', 'Ptprc', 'Ly6a', 'Blnk') Myeloid_lineage_signatures <- c('Gfi1', 'Spi1', 'Mpo', 'Csf2rb', 'Csf1r', 'Gfi1b', 'Hk3', 'Csf2ra', 'Csf3r', 'Sp1', 'Fcgr3') marker.genes <- c(HSC_lineage_signatures, Mpp_genes, Erythroid_lineage_signatures, Lymphoid_lineage_signatures, Myeloid_lineage_signatures) }else if(is.null(marker.genes)&(Org == 'hsa')){ HSPCs_lineage_signatures <- c('CD34','KIT','AVP','FLT3','MME','CD7','CD38','CSF1R','FCGR1A','MPO','ELANE','IL3RA') Myeloids_lineage_signatures <- c('LYZ','CD36','MPO','FCGR1A','CD4','CD14','CD300E','ITGAX','FCGR3A','FLT3','AXL', 'SIGLEC6','CLEC4C','IRF4','LILRA4','IL3RA','IRF8','IRF7','XCR1','CD1C','THBD', 'MRC1','CD34','KIT','ITGA2B','PF4','CD9','ENG','KLF','TFRC') B_cells_lineage_signatures <- c('CD79A','IGLL1','RAG1','RAG2','VPREB1','MME','IL7R','DNTT','MKI67','PCNA','TCL1A','MS4A1','IGHD','CD27','IGHG3') T_NK_cells_lineage_signatures <- c('CD3D','CD3E','CD8A','CCR7','IL7R','SELL','KLRG1','CD27','GNLY', 'NKG7','PDCD1','TNFRSF9','LAG3','CD160','CD4','CD40LG','IL2RA', 'FOXP3','DUSP4','IL2RB','KLRF1','FCGR3A','NCAM1','XCL1','MKI67','PCNA','KLRF') marker.genes <- c(HSPCs_lineage_signatures, Myeloids_lineage_signatures, B_cells_lineage_signatures, T_NK_cells_lineage_signatures) } if(is.null(ViolinPlot.cellTypeOrders)){ ViolinPlot.cellTypeOrders <- unique(sc_object@meta.data$selectLabels) } if(is.null(ViolinPlot.cellTypeColors)){ ViolinPlot.cellTypeColors <- viridis::viridis(length(unique(sc_object@meta.data$selectLabels))) } combinedViolinPlot(dataMatrix = dataMatrix, features = marker.genes, CellTypes = sc_object@meta.data$selectLabels, cellTypeOrders = ViolinPlot.cellTypeOrders, cellTypeColors = ViolinPlot.cellTypeColors, Org = Org, output.dir = paste0(output.dir, '/Step9.Violin_plot_for_marker_genes/'), databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.10 Step 10. Calculate Lineage Scores Create a folder for saving the results of lineage score calculation. print('Step10. Calculate lineage scores.') # we use normalized data here if (!file.exists(paste0(output.dir, '/Step10.Calculate_lineage_scores/'))) { dir.create(paste0(output.dir, '/Step10.Calculate_lineage_scores/')) } Run lineage score calculation. if(is.null(lineage.genelist)&is.null(lineage.names)&(Org == 'mmu')){ lineage.genelist <- c(list(HSC_lineage_signatures), list(Mpp_genes), list(Erythroid_lineage_signatures), list(Lymphoid_lineage_signatures), list(Myeloid_lineage_signatures)) lineage.names <- c('HSC_lineage_signatures', 'Mpp_genes', 'Erythroid_lineage_signatures', 'Lymphoid_lineage_signatures', 'Myeloid_lineage_signatures') }else if(is.null(lineage.genelist)&is.null(lineage.names)&(Org == 'hsa')){ lineage.genelist <- c(list(HSPCs_lineage_signatures), list(Myeloids_lineage_signatures), list(B_cells_lineage_signatures), list(T_NK_cells_lineage_signatures)) lineage.names <- c('HSPCs_lineage_signatures', 'Myeloids_lineage_signatures', 'B_cells_lineage_signatures', 'T_NK_cells_lineage_signatures') } if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } lineageScores(expression_matrix = expression_matrix, cellTypes = sc_object@meta.data$selectLabels, cellTypes_orders = cellTypes_orders, cellTypes_colors = ViolinPlot.cellTypeColors, groups = sc_object@meta.data$datasetID, groups_orders = unique(sc_object@meta.data$datasetID), groups_colors = groups_colors, lineage.genelist = lineage.genelist, lineage.names = lineage.names, Org = Org, output.dir = paste0(output.dir, '/Step10.Calculate_lineage_scores/'), databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.11 Step 11. GSVA Create a folder for saving the results of GSVA. print('Step11. GSVA.') if (!file.exists(paste0(output.dir, '/Step11.GSVA/'))) { dir.create(paste0(output.dir, '/Step11.GSVA/')) } Run GSVA. setwd(wdir) if(Org=='mmu'){ load(paste0(databasePath,"/mouse_c2_v5p2.rdata")) GSVA.genelist <- Mm.c2 assign('OrgDB', org.Mm.eg.db) }else if(Org=='hsa'){ load(paste0(databasePath,"/human_c2_v5p2.rdata")) GSVA.genelist <- Hs.c2 assign('OrgDB', org.Hs.eg.db) }else{ stop("Org should be 'mmu' or 'hsa'.") } if(is.null(ViolinPlot.cellTypeOrders)){ cellTypes_orders <- unique(sc_object@meta.data$selectLabels) }else{ cellTypes_orders <- ViolinPlot.cellTypeOrders } run_GSVA(sc_object = sc_object, GSVA.genelist = GSVA.genelist, GSVA.cellTypes = sc_object@meta.data$selectLabels, GSVA.cellTypes.orders = cellTypes_orders, GSVA.cellGroups = sc_object@meta.data$datasetID, GSVA.identify.cellType.features = Step11_GSVA.identify.cellType.features, GSVA.identify.diff.features = Step11_GSVA.identify.diff.features, GSVA.comparison.design = Step11_GSVA.comparison.design, OrgDB = OrgDB, output.dir = paste0(output.dir, '/Step11.GSVA/')) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.12 Step 12. Construct Trajectories Load gene symbols and ensemble IDs. DefaultAssay(sc_object) <- 'RNA' countsSlot <- GetAssayData(object = sc_object, slot = "counts") gene_metadata <- as.data.frame(rownames(countsSlot)) rownames(gene_metadata) <- gene_metadata[,1] if(Org == 'mmu'){ load(paste0(databasePath,"/mouseGeneSymbolandEnsembleID.rdata")) gene_metadata $ ensembleID <- mapvalues(x = gene_metadata[,1], from = mouseGeneSymbolandEnsembleID$geneName, to = mouseGeneSymbolandEnsembleID$ensemblIDNoDot, warn_missing = FALSE) }else if(Org == 'hsa'){ load(paste0(databasePath,"/humanGeneSymbolandEnsembleID.rdata")) gene_metadata $ ensembleID <- mapvalues(x = gene_metadata[,1], from = humanGeneSymbolandEnsembleID$geneName, to = humanGeneSymbolandEnsembleID$ensemblIDNoDot, warn_missing = FALSE) } colnames(gene_metadata) <- c('gene_short_name','ensembleID') Create folders for saving the results of trajectory construction. print('Step12. Construct trajectories.') if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/monocle2/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/monocle2/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/slingshot/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/slingshot/')) } if (!file.exists(paste0(output.dir, '/Step12.Construct_trajectories/scVelo/'))) { dir.create(paste0(output.dir, '/Step12.Construct_trajectories/scVelo/')) } Prepare the input data. if(is.null(Step12_Construct_Trajectories.clusters)){ sc_object.subset <- sc_object countsSlot.subset <- GetAssayData(object = sc_object.subset, slot = "counts") }else{ sc_object.subset <- subset(sc_object, subset = selectLabels %in% Step12_Construct_Trajectories.clusters) countsSlot.subset <- GetAssayData(object = sc_object.subset, slot = "counts") } Run monocle2. # monocle2 phenoData <- sc_object.subset@meta.data featureData <- gene_metadata run_monocle(cellData = countsSlot.subset, phenoData = phenoData, featureData = featureData, lowerDetectionLimit = 0.5, expressionFamily = VGAM::negbinomial.size(), cellTypes='selectLabels', monocle.orders=Step12_Construct_Trajectories.clusters, monocle.colors = ViolinPlot.cellTypeColors, output.dir = paste0(output.dir, '/Step12.Construct_trajectories/monocle2/')) Run slingshot. # slingshot if( (length(input.data.dirs) > 1) & Step2_Quality_Control.RemoveBatches ){ DefaultAssay(sc_object.subset) <- 'integrated' }else{ DefaultAssay(sc_object.subset) <- 'RNA'} run_slingshot(slingshot.PCAembeddings = Embeddings(sc_object.subset, reduction = "pca")[, PCs], slingshot.cellTypes = sc_object.subset@meta.data$selectLabels, slingshot.start.clus = slingshot.start.clus, slingshot.end.clus = slingshot.end.clus, slingshot.colors = slingshot.colors, output.dir = paste0(output.dir, '/Step12.Construct_trajectories/slingshot/')) Run scVelo. # scVelo if((!is.null(loom.files.path))&(!is.null(pythonPath))){ prepareDataForScvelo(sc_object = sc_object.subset, loom.files.path = loom.files.path, scvelo.reduction = 'pca', scvelo.column = 'selectLabels', output.dir = paste0(output.dir, '/Step12.Construct_trajectories/scVelo/')) reticulate::py_run_string(paste0("import os\\noutputDir = '", output.dir, "'")) reticulate::py_run_file(file.path(system.file(package = "HemaScopeR"), "python/sc_run_scvelo.py"), convert = FALSE) } Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.13 Step 13. TF Analysis Create folders for saving the results of TF analysis. print('Step13. TF analysis.') if (!file.exists(paste0(output.dir, '/Step13.TF_analysis/'))) { dir.create(paste0(output.dir, '/Step13.TF_analysis/')) } Run SCENIC to perform TF analysis. run_SCENIC(countMatrix = countsSlot, cellTypes = sc_object@meta.data$selectLabels, datasetID = sc_object@meta.data$datasetID, cellTypes_colors = Step13_TF_Analysis.cellTypes_colors, cellTypes_orders = unique(sc_object@meta.data$selectLabels), groups_colors = Step13_TF_Analysis.groups_colors, groups_orders = unique(sc_object@meta.data$datasetID), Org = Org, output.dir = paste0(output.dir, '/Step13.TF_analysis/'), pythonPath = pythonPath, databasePath = databasePath) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } 4.14 Step 14. Cell-Cell Interaction Create folders for saving the results of cell-cell interaction analysis. print('Step14. Cell-cell interaction.') if (!file.exists(paste0(output.dir, '/Step14.Cell_cell_interection/'))) { dir.create(paste0(output.dir, '/Step14.Cell_cell_interection/')) } Run CellChat to perform cell-cell interaction analysis. tempwd <- getwd() run_CellChat(data.input=countsSlot, labels = sc_object@meta.data$selectLabels, cell.orders = ViolinPlot.cellTypeOrders, cell.colors = ViolinPlot.cellTypeColors, sample.names = rownames(sc_object@meta.data), Org = Org, sorting = sorting, output.dir = paste0(output.dir, '/Step14.Cell_cell_interection/')) setwd(tempwd) Save the variables. # Get the names of all variables in the current environment variable_names <- ls() # Loop through the variable names and save them as RDS files for (var_name in variable_names) { var <- get(var_name) # Get the variable by its name saveRDS(var, file = paste0(output.dir, '/RDSfiles/', var_name, ".rds")) # Save as RDS with the variable's name } "],["stey-by-step-st-seq-pipeline.html", "5 Stey-by-step st-seq pipeline 5.1 Step 1. Data loading 5.2 Step 2. QC 5.3 Step 3. Clustering 5.4 Step 4. DEGs 5.5 Step 5. Spatially variable features 5.6 Step 6. Spatial interaction 5.7 Step 7. CNV analysis 5.8 Step 8. Deconvolution 5.9 Step 9. Cell cycle 5.10 Step 10. Niche analysis", " 5 Stey-by-step st-seq pipeline 5.1 Step 1. Data loading The st_Loading_Data function is designed for loading 10X Visium spatial transcriptomics data from Space Ranger. It will load data from input.data.dir and output it in the SeuratOjbect format. 5.1.1 Function arguments: input.data.dir: The directory where the input data is stored. output.dir: The directory where the processed output will be saved. If not specified, the output is saved in the current working directory. Default is ‘.’. sampleName: A string naming the sample. Default is ‘Hema_ST’. rds.file: A boolean indicating if the input data is in RDS file format rather than a typical results of Space Ranger. Default is FALSE. filename: The name of the file to be loaded if the data is not in RDS format. Default is “filtered_feature_bc_matrix.h5”. assay: The specific assay to apply to the data. Default is ‘Spatial’. slice: The image slice identifier for the spatial data. Default is ‘slice1’. filter.matrix: A boolean indicating whether to load filtered matrix. Default is TRUE. to.upper: A boolean indicating whether to convert feature names to upper form. Default is FALSE. 5.1.2 Funciton behavior: Directory Creation: The function first checks if the output.dir exists; if not, it creates it. RDS File Handling: If rds.file is TRUE, it reads the RDS file, ensuring the specified assay and slice are present in the Seurat object. Non-RDS File Handling: If rds.file is FALSE, it loads the data using Load10X_Spatial from Seurat. Saving the Object: Uses SaveH5Seurat and Convert to save the Seurat object in rds and h5ad formats. File Copying: Copies any necessary files (filter matrix, spatial image) to the output.dir. Return Value: Returns the processed Seurat object. 5.1.3 An example: st_obj <- st_Loading_Data( input.data.dir = 'path/to/data', output.dir = '.', sampleName = 'Hema_ST, rds.file = FALSE, filename = 'filtered_feature_bc_matrix.h5', assay = 'Spatial', slice = 'slice1', filter.matrix = TRUE, to.upper = FALSE ) 5.1.4 Outputs: Spatial transcriptome data in rds and h5ad formats 5.2 Step 2. QC The QC_Spatial function performs basic quality control on a SeuratObject containing 10X visium data and returns the filtered SeuratObject. It provides options to set thresholds for the number of genes, nUMI (unique molecular identifiers), and spots expressing each gene. It also allows for the removal of mitochondrial genes based on species. 5.2.1 Function arguments: st_obj: A SeuratObject of 10X visium data. output.dir: A character string specifying the path to store the results and figures. Default is the current working directory. min.gene: An integer representing the minimum number of genes detected in a spot. Default is 200. max.gene: An integer representing the maximum number of genes detected in a spot. Default is Inf (no upper limit). min.nUMI: An integer representing the minimum number of nUMI detected in a spot. Default is 500. max.nUMI: An integer representing the maximum number of nUMI detected in a spot. Default is Inf (no upper limit). min.spot: An integer representing the minimum number of spots expressing each gene. Default is 3. species: A character string representing the species of sample, either ‘human’ or ‘mouse’. bool.remove.mito: A boolean value indicating whether to remove mitochondrial genes. Default is TRUE. SpatialColors: A function that interpolates a set of given colors to create new color palettes and color ramps. Default is a color palette with reversed Spectral colors from RColorBrewer. 5.2.2 Function behavior: Plots and saves the spatial distribution of nUMI and nGene. Plots and saves violin plots for nUMI and nGene. Identifies and marks low-quality spots based on nUMI and nGene thresholds. Plots the spatial distribution of quality. Plots and saves a histogram for the number of spots expressing each gene. Plots the spatial distribution of mitochondrial genes. Saves the raw SeuratObject before filtering. Removes low-quality spots and genes with fewer occurrences. Optionally removes mitochondrial genes. Saves the filtered SeuratObject. Returns the filtered st_obj. 5.2.3 An example: st_obj <- QC_Spatial( st_obj = st_obj, output.dir = '.', min.gene = 200, min.nUMI = Inf, max.gene = 500, max.nUMI = Inf, min.spot = 3, species = 'human', bool.remove.mito = TRUE, SpatialColors = colorRampPalette(colors = rev(x = brewer.pal(n = 11, name = "Spectral"))) ) 5.2.4 Outputs: Figures showing the spatial distribution of nUMI and nGene. Violin plots of nUMI and nGene. Figures showing the quality. Histograms for the number of spots expressing each gene. Figures showing the spatial distribution of mitochondrial genes. Raw and filtered SeuratObject. 5.3 Step 3. Clustering The st_Clustering function is designed to perform clustering analysis on spatial transcriptomics data. It integrates several key steps including data normalization, dimensionality reduction, clustering, and visualization. The function saves the results and visualizations to output.dir. 5.3.1 Function arguments: st_obj: The input spatial transcriptomics seurat object that contains the data to be clustered. output.dir: The directory where the output files will be saved. Default is the current directory (‘.’). normalization.method: The method used for data normalization. Default is ‘SCTransform’. npcs: The number of principal components to use in PCA. Default is 50. pcs.used: The principal components to use for clustering. Default is the first 10 PCs (1:10). resolution: The resolution parameter for the clustering algorithm. Default is 0.8. verbose: A logical flag to print progress messages. Default is FALSE. 5.3.2 Function behavior: Data Normalization and PCA: Depending on the normalization.method, the function either uses SCTransform or a standard normalization method followed by scaling and variable feature detection. Performs PCA on the normalized data. Clustering and Dimensionality Reduction: Finds nearest neighbors using the specified principal components (pcs.used). Identifies clusters using the specified resolution. Performs UMAP and t-SNE for visualization of the clusters. Visualization: Generates spatial, UMAP, and t-SNE plots of the clusters with customized color schemes. Saves these plots as images in the specified directory. Saving Results: Saves the updated st_obj as an RDS file. Exports the metadata of st_obj to a CSV file. Return Value: Returns the updated st_obj containing the clustering results. 5.3.3 An example: st_obj <- st_Clustering( st_obj = st_obj, output.dir = '.', normalization.method = 'SCTransform', npcs = 50, pcs.used = 1:10, resolution = 0.8, verbose = FALSE ) 5.3.4 Outputs: Figures showing the results of clustering. SeuratObject in rds format. 5.4 Step 4. DEGs The st_Find_DEGs function is designed to identify differentially expressed genes (DEGs) in spatial transcriptomics data. It performs differential expression analysis based on clustering results, visualizes the top markers, and saves the results to output.dir. 5.4.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for DEG analysis. output.dir: The directory where output files will be saved. Default is the current directory (‘.’). ident.label: The metadata label used for identifying clusters. Default is 'seurat_clusters'. only.pos: A logical flag to include only positive markers. Default is TRUE. min.pct: The minimum fraction of cells expressing the gene in either cluster. Default is 0.25. logfc.threshold: The log fold change threshold for considering a gene differentially expressed. Default is 0.25. test.use: The statistical test to use for differential expression analysis. Default is 'wilcox'. verbose: A logical flag to print progress messages. Default is FALSE. 5.4.2 Function behavior: Set Identifiers: Sets the cluster identifiers in the spatial transcriptomics object (st_obj) based on the specified ident.label. Find Differentially Expressed Genes (DEGs): Performs differential expression analysis using the specified parameters (only.pos, min.pct, logfc.threshold, test.use). Top Marker Genes: Selects the top 5 marker genes for each cluster based on the highest average log fold change. Visualization: Generates a dot plot for the top DEGs and saves the plot as an image in the specified directory. Saving Results: Saves the DEG results as a CSV file. Return Value: Returns the data frame containing the identified DEGs. 5.4.3 An example: st.markers <- st_Find_DEGs( st_obj = st_obj, output.dir = '.', ident.label = 'seurat_clusters', only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, test.use = 'wilcox', verbose = FALSE ) 5.4.4 Outputs: Dot plots showing markers. CSV file containing the information of markers. 5.5 Step 5. Spatially variable features The st_SpatiallyVariableFeatures function identifies and visualizes spatially variable features (SVFs) in spatial transcriptomics data. It integrates the identification of spatially variable features using a specified method, saves the results to a directory, and creates visualizations of the top spatially variable features. 5.5.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for analysis. output.dir: The directory where output files will be saved. Default is the current directory. assay: The assay to be used for finding spatially variable features. Default is 'SCT'. selection.method: The method used for selecting spatially variable features. Default is 'moransi'. n.top.show: The number of top spatially variable features to visualize. Default is 10. n.col: The number of columns for the visualization grid. Default is 5. verbose: A logical flag to print progress messages. Default is FALSE. 5.5.2 Function behavior: Identify Spatially Variable Features: Identifies spatially variable features using the specified method and assay. Suppresses warnings during the process. Save Metadata: Extracts metadata features and saves them as a CSV file in output.dir. Visualization: Selects the top n.top.show spatially variable features. Generates and saves a spatial feature plot of these features in the specified directory. Return Value: Returns the updated st_obj containing the identified spatially variable features. 5.5.3 An example: st_obj <- st_SpatiallyVariableFeatures( st_obj = st_obj, output.dir = '.', assay = st_obj@active.assay, selection.method = 'moransi', n.top.show = 10, n.col = 5, verbose = FALSE ) 5.5.4 Outputs: Figures showing SVFs. CSV file containing the information of SVFs. 5.6 Step 6. Spatial interaction The st_Interaction function is used to identify and visualize interactions between clusters based on spatial transcriptomics data. It utilizes Commot to analyze spatial interactions, identify pathway activities, and assess the strength and significance of interactions. 5.6.1 Function arguments: st_data_path: Path to the spatial transcriptomics data. metadata_path: Path to the metadata associated with the spatial transcriptomics data. library_id: Identifier for the spatial transcriptomics library. Default is 'Hema_ST'. label_key: Key in the metadata to identify cell clusters. Default is 'seurat_clusters'. save_path: The directory where output files will be saved. Default is the current directory. species: The species of the spatial transcriptomics data. Default is 'human'. signaling_type: Type of signaling interactions to consider. Default is 'Secreted Signaling'. database: Database to be used for the analysis. Default is 'CellChat'. min_cell_pct: Minimum percentage of cells to consider for interaction analysis. Default is 0.05. dis_thr: Distance threshold for defining interactions. Default is 500. n_permutations: Number of permutations for assessing significance. Default is 100. pythonPath: The path to the Python environment containing Commot to use for the analysis. Default is ‘.’. 5.6.2 Function behavior: Commot Analysis: Uses Commot to perform interaction analysis, identifying interactions within and between clusters. Visualization: Generates visualizations of pathway interactions and interactions between ligand-receptors (LRs) within and between clusters, and saves them in save_path. 5.6.3 An example: st_Interaction( st_data_path = 'path/to/data', metadata_path = 'path/to/metadata', library_id = 'Hema_ST', label_key = 'seurat_clusters', save_path = '.', species = 'human', signaling_type = 'Secreted Signaling', database = 'CellChat', min_cell_pct = 0.05, dis_thr = 500, n_permutations = 100, pythonPath = 'path/to/python' ) 5.6.4 Outputs: Dot plot showing pathway interaction between and within clusters. Dot plot showing LRs interaction between and within clusters. The information of each LR and pathway. 5.7 Step 7. CNV analysis The st_CNV function identifies and visualizes copy number variations (CNVs) in spatial transcriptomics data. It uses CopyKAT to perform the CNV analysis, saves the results, and generates visual representations of CNV states. 5.7.1 Function arguments: st_obj: The input spatial transcriptomics object containing the data for analysis. save_path: The directory where output files will be saved. assay: The assay to be used for CNV analysis. Default is 'Spatial'. LOW.DR: The lower threshold for the dropout rate in CopyKAT. Default is 0.05. UP.DR: The upper threshold for the dropout rate in CopyKAT. Default is 0.1. win.size: The window size for the CNV analysis. Default is 25. distance: The distance metric to be used for the analysis. Default is \"euclidean\". genome: The genome version to be used, ‘hg20’ or ‘mm10’. Default is \"hg20\". n.cores: The number of cores to be used for parallel processing. Default is 1. species: The species of the spatial transcriptomics data. Default is 'human'. 5.7.2 Function behavior: CopyKAT Analysis: Runs CopyKAT pipeline to perform CNV analysis using the provided parameters. Saving Results: Saves the CopyKAT results as an RDS file. Plotting: Generates plots of the CNV states and saves them in save_path. Updating Metadata: Updates the spatial transcriptomics object with CNV state metadata. Return Value: Returns the updated st_obj containing the CNV state information. 5.7.3 An example: st_obj <- st_CNV( st_obj = st_obj, save_path = '.', assay = 'Spatial', LOW.DR = 0.05, UP.DR = 0.1, win.size = 25, distance = "euclidean", genome = 'hg20', n.cores = 1, species = 'human' ) 5.7.4 Outputs: Figures showing the predicted CNV states. Figures showing the CNV heatmap. rds files of results of copykat. 5.8 Step 8. Deconvolution The st_Deconvolution function aims to perform spatial deconvolution analysis on spatial transcriptomics data to estimate the cell-type composition and abundance in different regions. The function utilizes cell2location to infer cell-type abundance and spatial distributions, allowing for the visualization and interpretation of spatially resolved cell populations within the tissue. 5.8.1 Function arguments: st.data.dir: Path to the spatial transcriptomics data. sc.h5ad.dir: Path to the single-cell RNA-seq data in h5ad format. Default is NULL. library_id: Identifier for the spatial transcriptomics library. Default is 'Hema_ST'. st_obj: Spatial transcriptomics object containing the data for analysis. Default is NULL. save_path: The directory where output files will be saved. Default is NULL. sc.labels.key: Key in the single-cell metadata to identify cell clusters. Default is 'seurat_clusters'. species: The species of the spatial transcriptomics data. Default is 'mouse'. sc.max.epoch: Maximum number of epochs used for single-cell deconvolution. Default is 1000. st.max.epoch: Maximum number of epochs used for spatial deconvolution. Default is 10000. use.gpu: Logical value indicating whether to use GPU for computation. Default is FALSE. use.Dataset: The dataset to be used for analysis, such as 'HematoMap' or 'LymphNode'. pythonPath: The path to the Python environment containing cell2location to use for the analysis. Default is ‘.’. 5.8.2 Function behavior: Deconvolution Analysis: Performs the spatial deconvolution analysis using the provided spatial transcriptomics and single-cell RNA-seq data. Post-Analysis Processing: Processes the deconvolution results and visualizes the spatial distribution of inferred cell types within the tissue. Returning Results: If a Seurat object is provided, the updated Seurat object with cell type information is returned. 5.8.3 An example: st_obj <- st_Deconvolution( st.data.dir = 'path/to/data', library_id = 'Hema_ST', sc.h5ad.dir = NULL, st_obj = st_obj, save_path = '.', sc.labels.key = 'seurat_clusters', species = 'human', sc.max.epoch = 1000, st.max.epoch = 10000, use.gpu = FALSE, use.Dataset = 'LymphNode', pythonPath = 'path/to/python' ) 5.8.4 Outputs: Figures showing the predicted abundance of each cell-type. The parameters of trained cell2location model. 5.9 Step 9. Cell cycle The st_Cell_cycle function is used to assess the cell cycle phase scores in spatial transcriptomics data. It calculates S phase and G2M phase scores based on the expression of designated cell cycle-related genes and visualizes these scores in spatial and dimensionality-reduced plots. 5.9.1 Function arguments: st_obj: The input Seurat object containing the data for analysis. save_path: The directory where the output images will be saved. Default is the current directory. s.features: A list of genes associated with the S phase. Default is NULL (using genes from Seurat). g2m.features: A list of genes associated with the G2M phase. Default is NULL (using genes from Seurat). species: The species of the spatial transcriptomics data. Default is 'human'. FeatureColors.bi: A color palette for visualization. Default is a two-color ramp palette. 5.9.2 Function behavior: Gene Feature Assignment: Assigns S phase and G2M phase gene lists based on the specified species or provided input. Cell Cycle Scoring: Calculates the S phase and G2M phase scores in the data. Spatial Visualization: Generates spatial feature plots to visualize the S phase and G2M phase scores using the specified color palette and saves the plots as images. Dimensionality-Reduced Plot Visualization: If UMAP or tSNE dimensionality reduction is available in the st_obj, feature plots of the S phase and G2M phase scores are generated in the reduced space and saved as images. Return Value: Returns the updated st_obj containing the cell cycle phase scores. 5.9.3 An example: st_obj <- st_Cell_cycle( st_obj = st_obj, save_path = '.', s.features = NULL, g2m.features = NULL, species = 'human', FeatureColors.bi = colorRampPalette(colors = rev(x = brewer.pal(n = 11, name = 'RdYlBu'))) ) 5.9.4 Outputs: Figures showing S scores. Figures showing S scores. 5.10 Step 10. Niche analysis The st_NicheAnalysis function is designed to perform niche analysis on spatial transcriptomics data, enabling the exploration of spatial niches or microenvironments within the tissue. The function encompasses co-occurrence analysis, niche clustering, and niche interaction analysis to uncover the spatial relationships and characteristics of different cell populations or features. 5.10.1 Function arguments: st_obj: The input SeuratObject containing the spatial transcriptomics data for analysis. features: A vector of features representing features (for example, cell types from deconvolution) for niche analysis. save_path: The directory where the analysis results and visualizations will be saved. Default is the current directory. coexistence.method: The method for co-occurrence analysis, accepting 'correlation' or 'Wasserstein'. Default is 'correlation'. kmeans.n: The number of clusters for niche clustering. Default is 4. st_data_path: A path containing the ‘spatial’ file and ‘filtered_feature_bc_matrix.h5’ file, required for niche interaction visualization. slice: The slice to be used for analysis. Default is 'slice1'. species: The species of the sample data. Default is 'mouse'. pythonPath: The path to the Python environment containing Commot to use for the analysis. Default is ‘.’. 5.10.2 Function behavior: Co-occurrence Score Calculation: Calculates the co-occurrence scores between the specified features using the chosen coexistence method (‘correlation’ or ‘Wasserstein’). Niche Clustering: Utilizes k-means clustering to identify distinct spatial niches based on the expression profiles of the selected features and visualizes the clustering results. Niche Interaction Visualization: If the st_data_path is provided, performs niche interaction visualization using Commot, which is based on the provided spatial transcriptomics data and generates visualizations of niche interactions within the tissue. Return Value: Returns the updated st_obj with niche analysis results and visualizations. 5.10.3 An example: tmp <- read.csv('path/to/cell2loc_res.csv', row.names = 1) features <- colnames(tmp) if(!all(features %in% names(st_obj@meta.data))){ common.barcodes <- intersect(colnames(st_obj), rownames(tmp)) tmp <- tmp[common.barcodes, ] st_obj <- st_obj[, common.barcodes] st_obj <- AddMetaData(st_obj, metadata = tmp) } st_obj <- st_NicheAnalysis( st_obj, features = features, save_path = '.', coexistence.method = 'correlation', kmeans.n = 4, st_data_path = 'path/to/data', slice = `slice1`, species = 'human', condaenv = 'path/to/python' ) 5.10.4 Outputs: Figures showing the co-existence results. Figures showing the spatial distribution of each niche. Figures showing the composition of each niche. Figures showing the results of interactions using Commot. "]]