forked from lmweber/PrinciplesSTA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
dimensionality-reduction.qmd
78 lines (47 loc) · 2.37 KB
/
dimensionality-reduction.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# Dimensionality reduction {#sec-dimensionality-reduction}
## Overview
In this chapter, we apply dimensionality reduction methods to visualize the data and to generate inputs for further downstream analyses.
## Load data from previous steps
We start by loading the data object(s) saved after running the analysis steps from the previous chapters. Code to re-run the previous steps is shown in condensed form in @sec-analysis-steps.
```{r, message=FALSE, results='hide'}
library(SpatialExperiment)
library(here)
spe <- readRDS(here("outputs/spe_hvgs.rds"))
top_hvgs <- readRDS(here("outputs/top_hvgs.rds"))
```
## Principal component analysis (PCA)
Apply principal component analysis (PCA) to the set of top highly variable genes (HVGs) to reduce the dimensionality of the dataset, and retain the top 50 principal components (PCs) for further downstream analyses.
This is done for two reasons: (i) to reduce noise due to random variation in expression of biologically uninteresting genes, which are assumed to have expression patterns that are independent of each other, and (ii) to improve computational efficiency during downstream analyses.
We use the computationally efficient implementation of PCA provided in the `scater` package [@McCarthy2017]. This implementation uses randomization, and therefore requires setting a random seed for reproducibility.
```{r, message=FALSE}
library(scater)
```
```{r}
# compute PCA
set.seed(123)
spe <- runPCA(spe, subset_row = top_hvgs)
reducedDimNames(spe)
dim(reducedDim(spe, "PCA"))
```
## Uniform Manifold Approximation and Projection (UMAP)
We also run UMAP on the set of top 50 PCs and retain the top 2 UMAP components, which will be used for visualization purposes.
```{r}
# compute UMAP on top 50 PCs
set.seed(123)
spe <- runUMAP(spe, dimred = "PCA")
reducedDimNames(spe)
dim(reducedDim(spe, "UMAP"))
# update column names for easier plotting
colnames(reducedDim(spe, "UMAP")) <- paste0("UMAP", 1:2)
```
## Visualizations
Generate plots using plotting functions from the [ggspavis](https://bioconductor.org/packages/ggspavis) package. In the next chapter on clustering, we will add cluster labels to these reduced dimension plots.
```{r, message=FALSE}
library(ggspavis)
```
```{r, fig.width=4.25, fig.height=4.25}
# plot top 2 PCA dimensions
plotDimRed(spe, plot_type = "PCA")
# plot top 2 UMAP dimensions
plotDimRed(spe, plot_type = "UMAP")
```