-
Notifications
You must be signed in to change notification settings - Fork 34
/
README.Rmd
107 lines (82 loc) · 2.93 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
title: "annotables"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
cache = FALSE,
echo = TRUE,
message = FALSE,
warning = FALSE)
library(annotables)
```
[![DOI](https://zenodo.org/badge/3882/stephenturner/annotables.svg)](https://zenodo.org/badge/latestdoi/3882/stephenturner/annotables)
Provides tables for converting and annotating Ensembl Gene IDs.
## Installation
```{r, eval=FALSE}
install.packages("devtools")
devtools::install_github("stephenturner/annotables")
```
## Rationale
Many bioinformatics tasks require converting gene identifiers from one convention to another, or annotating gene identifiers with gene symbol, description, position, etc. Sure, [biomaRt](https://bioconductor.org/packages/release/bioc/html/biomaRt.html) does this for you, but I got tired of remembering biomaRt syntax and hammering Ensembl's servers every time I needed to do this.
This package has basic annotation information from **`r ensembl_version`** for:
- Human build 38 (`grch38`)
- Human build 37 (`grch37`)
- Mouse (`grcm38`)
- Rat (`rnor6`)
- Chicken (`galgal5`)
- Worm (`wbcel235`)
- Fly (`bdgp6`)
- Macaque (`mmul801`)
Where each table contains:
- `ensgene`: Ensembl gene ID
- `entrez`: Entrez gene ID
- `symbol`: Gene symbol
- `chr`: Chromosome
- `start`: Start
- `end`: End
- `strand`: Strand
- `biotype`: Protein coding, pseudogene, mitochondrial tRNA, etc.
- `description`: Full gene name/description
Additionally, there are `tx2gene` tables that link Ensembl gene IDs to Ensembl transcript IDs.
## Usage
```{r, eval=FALSE}
library(annotables)
```
Look at the human genes table (note the description column gets cut off because the table becomes too wide to print nicely):
```{r}
grch38
```
Look at the human genes-to-transcripts table:
```{r}
grch38_tx2gene
```
Tables are saved in [tibble](http://tibble.tidyverse.org) format, pipe-able with [dplyr](http://dplyr.tidyverse.org):
```{r, results='asis'}
grch38 %>%
dplyr::filter(biotype == "protein_coding" & chr == "1") %>%
dplyr::select(ensgene, symbol, chr, start, end, description) %>%
head %>%
knitr::kable(.)
```
Example with [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) results from the [airway](https://bioconductor.org/packages/release/data/experiment/html/airway.html) package, made tidy with [biobroom](http://www.bioconductor.org/packages/devel/bioc/html/biobroom.html):
```{r}
library(DESeq2)
library(airway)
data(airway)
airway <- DESeqDataSet(airway, design = ~cell + dex)
airway <- DESeq(airway)
res <- results(airway)
# tidy results with biobroom
library(biobroom)
res_tidy <- tidy.DESeqResults(res)
head(res_tidy)
```
```{r, results='asis'}
res_tidy %>%
dplyr::arrange(p.adjusted) %>%
head(20) %>%
dplyr::inner_join(grch38, by = c("gene" = "ensgene")) %>%
dplyr::select(gene, estimate, p.adjusted, symbol, description) %>%
knitr::kable(.)
```