-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
96 lines (75 loc) · 3.67 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
title: "GxEScanR: Performs GWAS/GWEIS scans using BinaryDosage files"
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# GxEScanR
<!-- badges: start -->
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/USCbiostats/GxEScanR?branch=master&svg=true)](https://ci.appveyor.com/project/USCbiostats/GxEScanR)
[![Travis build status](https://travis-ci.com/USCbiostats/GxEScanR.svg?branch=master)](https://travis-ci.com/USCbiostats/GxEScanR)
[![Codecov test coverage](https://codecov.io/gh/USCbiostats/GxEScanR/branch/master/graph/badge.svg)](https://codecov.io/gh/USCbiostats/GxEScanR?branch=master)
<!-- badges: end -->
GxEScanR is designed to efficiently run genome-wide association study (GWAS) and genome-wide by environmental interaction study (GWEIS) scans using imputed genotypes
stored in the BinaryDosage format. The phenotype to be analyzed can either be a
continuous or binary trait. The GWEIS scan performs multiple tests that can be
used in two-step methods.
## Installation
You can install the released version of GxEScanR from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("GxEScanR")
```
And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("USCbiostats/GxEScanR")
```
## Example
The following is a step by step example on how to run a GWAS.
The first step is to load the subject phenotype and covariate data. In this
example the data is included with the package. The data is stored in a RDS
formatted file. The first five lines of the data frame are shown. The first
column is the subject ID, sid. The second column is the phenotype, y. The last
column is a covariate,e.
```{r example1}
library(GxEScanR)
covdatafile <- system.file("extdata", "covdata.rds", package = "GxEScanR")
covdata <- readRDS(covdatafile)
covdata[1:5,]
```
The second step is to load the information about the binary dosage file. This
is obtained by running BinaryDosage::getbdinfo(<binary dosage file name>).
More information about this can be found in the BinaryDosage package.
A binary dosage file and information file about it are included with this
package. The getbdinfo routine stores the complete file path to the binary
dosage file. The installation routine moved the binary dosage from its
original location. The third line of code corrects this. The user will
not need to run the third line in normal usage.
```{r example2}
bdinfofile <- system.file("extdata", "pdata_4_1.bdinfo", package = "GxEScanR")
bdinfo <- readRDS(bdinfofile)
# Not normally run - This is needed only for the example data file
bdinfo$filename <- system.file("extdata", "pdata_4_1.bdose", package = "GxEScanR")
```
Everything is now ready to run a GWAS. The number of subjects used in the
analysis is displayed at the start of the run. There are a lot of options
not being shown in this README file. Information on these options can be found
in the documentation and vignettes.
```{r example3}
results <- gwas(data = covdata, bdinfo = bdinfo)
results
```
In this example the output was stored in a data frame and displayed. An option
exists to output the results to a text file that can easily be read into R
using read.table.
The columns in the output are the SNP ID, the coefficient estimate for the
genetic effect (betag), and the likelihood ratio test for the estimate (lrtg).
The first SNP was simulated with a log odds ratio of 0.75, and the others
were simulated to have no effect. The results are consistent with the modelling.