-
Notifications
You must be signed in to change notification settings - Fork 3
/
README-html.Rmd
222 lines (164 loc) · 6.81 KB
/
README-html.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
title: "Summarizer README"
output:
html_document:
theme: spacelab
toc: yes
toc_float: yes
number_sections: yes
df_print: kable
code_folding: hide
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
#Summarise variables/factors by a categorical variable
`summary.factorlist()` is a simple wrapper used to summarise any number of variables by a single categorical variable.
This is usually "Table 1" of a study report.
```{r, warning=FALSE, message=FALSE}
library(summarizer)
library(dplyr)
library(stringr)
# Load example dataset, modified version of survival::colon
data(colon_s)
# Table 1 - Patient demographics ----
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
summary.factorlist(dependent, explanatory, p=T)
```
`summary.factorlist()` is also commonly used to summarise any number of variables by an *outcome variable* (say dead yes/no).
```{r}
# Table 2 - 5 yr mortality ----
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
summary.factorlist(dependent, explanatory)
```
#Summarise regression model results in final table format
The second main feature is the ability to create final tables for logistic `glm()`, hierarchical logistic `lme4::glmer()` and
Cox proprotional hazard `survival::coxph()` regression models.
The `summarizer()` "all-in-one" function takes a single dependent variable with a vector of explanatory variable names
(continuous or categorical variables) to produce a final table for publication including summary statistics,
univariable and multivariable regression analyses. The first columns are those produced by
`summary.factorist()`.
##glm
`glm(depdendent ~ explanatory, family="binomial")`
```{r, message=FALSE}
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
summarizer(dependent, explanatory)
```
##multi-level
Where a multivariable model contains a subset of the variables specified in the full univariable set, this can be specified.
```{r, message=FALSE}
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
dependent = 'mort_5yr'
colon_s %>%
summarizer(dependent, explanatory, explanatory.multi)
```
##Random effects.
`lme4::glmer(dependent ~ explanatory + (1 | random_effect), family="binomial")`
```{r, message=FALSE}
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
random.effect = "hospital"
dependent = 'mort_5yr'
colon_s %>%
summarizer(dependent, explanatory, explanatory.multi, random.effect)
```
##with metrics
`metrics=TRUE` provides common model metrics.
*note - defaults to data.frame print out - kable doesn't handle list automatically*
```{r, message=FALSE}
colon_s %>%
summarizer(dependent, explanatory, explanatory.multi, metrics=TRUE)
```
##Cox proportional hazards
`survival::coxph(dependent ~ explanatory)`
```{r, message=FALSE}
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
summarizer(dependent, explanatory)
```
#Subsets
Rather than going all-in-one, any number of subset models can be manually added on to a `summary.factorlist()` table using `summarizer.merge()`. This is particularly useful when models take a long-time to run or are complicated.
##glm
Note requirement for `glm.id=TRUE`. `fit2df` is a subfunction extracting most common models to a dataframe.
```{r, message=FALSE}
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
random.effect = "hospital"
dependent = 'mort_5yr'
# Separate tables
colon_s %>%
summary.factorlist(dependent, explanatory, glm.id=TRUE) -> example.summary
colon_s %>%
glmuni(dependent, explanatory) %>%
fit2df(estimate.suffix=" (univariable)") -> example.univariable
colon_s %>%
glmmulti(dependent, explanatory) %>%
fit2df(estimate.suffix=" (multivariable)") -> example.multivariable
colon_s %>%
glmmixed(dependent, explanatory, random.effect) %>%
fit2df(estimate.suffix=" (multilevel") -> example.multilevel
# Pipe together
example.summary %>%
summarizer.merge(example.univariable) %>%
summarizer.merge(example.multivariable) %>%
summarizer.merge(example.multilevel) %>%
select(-c(glm.id, index)) -> example.final
example.final
```
##Cox Proportional Hazards example with separate tables merged together.
```{r, message=FALSE}
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
explanatory.multi = c("age.factor", "obstruct.factor")
dependent = "Surv(time, status)"
# Separate tables
colon_s %>%
summary.factorlist(dependent, explanatory, glm.id=TRUE) -> example2.summary
colon_s %>%
coxphuni(dependent, explanatory) %>%
fit2df(estimate.suffix=" (univariable)") -> example2.univariable
colon_s %>%
coxphmulti(dependent, explanatory.multi) %>%
fit2df(estimate.suffix=" (multivariable)") -> example2.multivariable
# Pipe together
example2.summary %>%
summarizer.merge(example2.univariable) %>%
summarizer.merge(example2.multivariable) %>%
select(-c(glm.id, index)) -> example2.final
example2.final
```
#Summarise regression model results in plot
Models can be summarized with odds ratio/hazard ratio plots using `or.plot` or `hr.plot` (hr.plot not fully tested).
```{r, fig.width=12, message=FALSE}
# OR plot
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
or.plot(dependent, explanatory)
# Previously fitted models (`glmmulti`) can be provided directly to `glmfit`
# HR plot (not fully tested)
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "Surv(time, status)"
colon_s %>%
hr.plot(dependent, explanatory, dependent_label = "Survival")
# Previously fitted models (`coxphmulti`) can be provided directly using `coxfit`
```
Our own particular `Rstan` models are supported and will be documented in the future. Broadly, if you are running (hierarchical) logistic regression models in [Stan](http://mc-stan.org/users/interfaces/rstan) with coefficients specified as a vector labelled `beta`, then `fit2df()` will work directly on the `stanfit` object in a similar manner to if it was a `glm` or `glmerMod` object.
# Notes
Use `Hmisc::label()` to assign labels to variables for tables and plots.
```{r}
label(colon_s$age.factor) = "Age (years)"
```
Export dataframe tables directly or to [R Markdown](http://rmarkdown.rstudio.com) using [`knitr::kable()`](https://yihui.name/knitr/).
Note wrapper `summary.missing()` can be useful. Wraps `mice::md.pattern`.
```{r}
colon_s %>%
summary.missing(dependent, explanatory)
```