-
Notifications
You must be signed in to change notification settings - Fork 2
/
Example_API_usage.Rmd
167 lines (140 loc) · 5.76 KB
/
Example_API_usage.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: "Example CREEDS API Usages Walk-Through"
author: "Zichen Wang"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, warning=FALSE)
```
## Load required packages
```{r, message=FALSE}
library(httr)
library(jsonlite)
library(Biobase)
library(GEOquery)
library(GeoDE)
library(limma)
library(ggplot2)
```
```{r}
BASE_URL <- 'http://amp.pharm.mssm.edu/CREEDS/'
```
## 1. Search signatures using text
The `/search` endpoint handles the search of the metadata associated with gene expression signatures. You can use a query term of interest such as 'breast cancer' to perform a search using an HTTP GET request.
The `GET` function performs an HTTP GET request and returns a response object. The `httr::content` method retrieves the content within the response object. Unfortunately the `content` method is masked by `Biobase::content`; consequently, we will have to use add a namespace when using it:
```{r}
r <- GET(paste0(BASE_URL, 'search'), query=list(q ='breast cancer'))
response <- fromJSON(httr::content(r, 'text'))
print(r$status_code)
```
The `response` is an R `data.frame`; the search result is as follows:
```{r}
head(response)
```
## 2. Query the signature database using up/down gene sets to find similar/opposite signatures
Signatures can also be queried against using a pair of up/down gene lists from the `/search` endpoint using an HTTP POST request. Here we generate vectors of up and down genes to query against the CREEDS database:
```{r}
## Get some up and down gene lists
up_genes <- c('KIAA0907','KDM5A','CDC25A','EGR1','GADD45B','RELB','TERF2IP','SMNDC1','TICAM1','NFKB2','RGS2','NCOA3','ICAM1','TEX10','CNOT4','ARID4B','CLPX','CHIC2','CXCL2','FBXO11','MTF2','CDK2','DNTTIP2','GADD45A','GOLT1B','POLR2K','NFKBIE','GABPB1','ECD','PHKG2','RAD9A','NET1','KIAA0753','EZH2','NRAS','ATP6V0B','CDK7','CCNH','SENP6','TIPARP','FOS','ARPP19','TFAP2A','KDM5B','NPC1','TP53BP2','NUSAP1')
dn_genes <- c('SCCPDH','KIF20A','FZD7','USP22','PIP4K2B','CRYZ','GNB5','EIF4EBP1','PHGDH','RRAGA','SLC25A46','RPA1','HADH','DAG1','RPIA','P4HA2','MACF1','TMEM97','MPZL1','PSMG1','PLK1','SLC37A4','GLRX','CBR3','PRSS23','NUDCD3','CDC20','KIAA0528','NIPSNAP1','TRAM2','STUB1','DERA','MTHFD2','BLVRA','IARS2','LIPA','PGM1','CNDP2','BNIP3','CTSL1','CDC25B','HSPA8','EPRS','PAX8','SACM1L','HOXA5','TLE1','PYGL','TUBB6','LOXL1')
payload = list(
up_genes = up_genes,
dn_genes = dn_genes,
direction= 'similar'
)
```
POST the input gene sets to the API and load the response to R `data.frame`:
```{r}
r <- POST(paste0(BASE_URL, 'search'), body=payload, encode='json')
print(r$status_code)
response <- fromJSON(httr::content(r, 'text'))
```
Let us examine the search result:
```{r}
head(response)
```
## 3. Retrieve a signature using `id`
```{r}
r <- GET(paste0(BASE_URL, 'api'), query=list(id= response[1, 'id']))
sig <- fromJSON(httr::content(r, 'text'))
```
`sig` is a `list` with the following keys:
```{r}
names(sig)
```
## 4. Retrieve a gene expression dataset from GEO and perform selected analyses
### 4.0. Prepare data
Examine the metadata required to retrieve the dataset from GEO:
```{r}
cat('GEO id:', sig$geo_id)
cat('Control GSMs:', sig$ctrl_ids)
cat('Perturbation GSMs:', sig$pert_ids)
```
Retrieve the GEO series using `GEOquery::getGEO` function, which return a `list` of `ExpressionSet` instances. The `ExpressionSet` object contains a `matrix` with rows being probes and columns being samples. This `matrix` can be accessed by the `exprs` method.
```{r}
eset <- getGEO(sig$geo_id, GSEMatrix=TRUE, AnnotGPL=TRUE)
eset <- eset[[1]]
print(dim(eset))
```
Subset the columns of the eset with `ctrl_ids` and `pert_ids`.
```{r}
eset <- eset[, c(sig$ctrl_ids, sig$pert_ids)]
print(dim(eset))
```
### 4.1. Explanatory analyses
Visualize the gene expression values across samples using a density plot:
```{r}
expr_df <- as.data.frame(exprs(eset))
dfs <- stack(expr_df)
colnames(dfs) <- c('Gene_expression', 'Sample')
ggplot(dfs, aes(x=Gene_expression)) + geom_density(aes(group=Sample, colour=Sample))
```
Visualize using a PCA plot:
```{r}
# Make a list labeling the whether the samples are controls or perturbations
n_ctrls <- length(sig$ctrl_ids)
n_perts <- length(sig$pert_ids)
sample_classes <- factor(c(rep('CTRL', n_ctrls), rep('PERT', n_perts)))
# also make a integer version of sample_classes using 1 for 'CTRL' and 2 for 'PERT'
sample_classes_i <- factor(c(rep(1, n_ctrls), rep(2, n_perts)))
# Perform PCA on the expression transposed expression data.frame (samples x genes)
pca <- prcomp(t(exprs(eset)), scale=T)
# Get the coordinates
pca_coords <- data.frame(sample_classes, pca$x[,1:3])
qplot(x=PC1, y=PC2, data=pca_coords, colour=sample_classes)
```
### 4.2. Perform differential expression analysis
#### 4.2.1. Use the Characteristic Direcion (CD) using the `chdirAnalysis` in the [`GeoDE`](https://cran.r-project.org/web/packages/GeoDE/index.html) package
```{r}
expr_df <- as.data.frame(exprs(eset))
expr_df <- cbind(rownames(expr_df), expr_df)
chdir_result <- chdirAnalysis(expr_df, sample_classes_i, gamma=0.5)
```
#### 4.2.2. Use [`Limma`](https://bioconductor.org/packages/release/bioc/html/limma.html)
First construct design matrix:
```{r}
design <- model.matrix(~sample_classes + 0, eset)
colnames(design) <- levels(sample_classes)
print(design)
```
Fit limma model and create a volcano plot:
```{r}
fit <- lmFit(eset, design)
cont.matrix <- makeContrasts(PERT-CTRL, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
volcanoplot(fit2, highlight=5)
```
Examine the top DEGs:
```{r}
topDEGs <- topTable(fit2, adjust="fdr")
# Only display probe ID and Gene.symbol
topDEGs <- topDEGs[,c('ID','Gene.symbol', 'logFC', 'AveExpr', 't', 'P.Value', 'adj.P.Val', 'B')]
print(topDEGs)
```
**Built with**
```{r}
sessionInfo()
```