forked from UBC-STAT/stat-545-guidebook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcm105.Rmd
391 lines (254 loc) · 14.6 KB
/
cm105.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
# (5) R Packages, Part I
--- LAST YEAR'S CONTENT BELOW ---
## Learning Objectives
This tutorial aims to get you started with package development in R. By the end of this tutorial, you'll have the beginnings of an R package called `powers` ([complete version](https://github.com/vincenzocoia/powers)). You'll learn about key components of an R package, and how to modify them.
We'll be going over the following topics:
* set up the directory structure for a package and put it under version control with `File` -> `New Project`
* define functions in R scripts located in the `R` directory of the package
* use `load_all` and `Build & Reload` to simulate loading the package
* use `Check` to check the package for coherence
* use `Build & Reload` to properly build and install the package
* edit the `DESCRIPTION` file of package metadata
* specify a LICENSE
* document and export the functions via `roxygen2` comments
* document the package itself via `use_package_doc()`
* create documentation and manage the `NAMESPACE` file via `document()`
* use `testthat` to implement unit testing
* use a function from another package via `use_package()` and syntax like `otherpkg::foofunction()`
* connect your local Git repo to a new remote on GitHub via `use_github()`
* create a `README.md` that comes from rendering `README.Rmd` containing actual usage, via `use_readme_rmd()`
* create a vignette via `use_vignette()` and build it via `build_vignettes()`
## Participation
We'll be developing the `powers` R package in class. Please follow along with this, developing in your participation repo.
At least, _some_ of the development. Sometimes it might be better to just sit back and watch. I'll try to inform you when to do what.
## Resources
This tutorial is adapted from [Jenny Bryan's STAT 547 tutorial](http://stat545.com/packages06_foofactors-package.html), where she develops the [`foofactors`](https://github.com/jennybc/foofactors) package.
Other resources you might find useful:
- Hadley's ["R Packages"](http://r-pkgs.had.co.nz/) book.
- Concise. Works with `devtools` and friends.
- [Package development cheatsheet](https://rawgit.com/rstudio/cheatsheets/master/package-development.pdf)
- ["Writing R Extensions"](http://cran.r-project.org/doc/manuals/r-release/R-exts.html), the official guide to writing R packages.
- Comprehensive. Doesn't refer to `devtools` and friends.
Others on specific topics:
- [Karl Broman on choosing a license](http://kbroman.org/pkg_primer/pages/licenses.html)
During exercise periods, in case you're ahead of the class and have time, you should work on [Homework 7](http://stat545.com/Classroom/assignments/hw07/hw07.html).
## Motivation
Why make a package in R? Here are just a few big reasons:
- Built-in checks that your functions are working and are sensible.
- Easy way to store and load your data -- [data packages](http://www.davekleinschmidt.com/r-packages/) like `gapminder` are awesome!
- Allows for documentation of functions that you've written.
- Companion for a journal article you're writing.
Think _aid for a type of analysis_, not an analysis itself.
And an R package _does not_ need to be big!
## Getting Started
Install/update the `devtools` package, used as an aid in package development:
```
install.packages("devtools")
```
This will do for now -- for development beyond the basics, you might need to [further configure your computer](http://stat545.com/packages01_system-prep.html).
## Let's start with a single function
### Function creation
Follow along as we make an R package called `powers` that contains a function `square` that squares its input. Let's initiate it:
- RStudio —> New project —> R Package
- Initiate git (optional, but recommended).
- Under the "Build" menu, click "Install and Restart"
- Check out the files that have been created
- Rd
- NAMESPACE
- DESCRIPTION
Now, start a new R script in the `R` directory, called `square.R`. Write a function called `square` that squares its input.
Build the package:
* `Build and Reload`, or in newer versions of RStudio, `Install and Restart`.
* This compiles the package, and loads it.
* Try leaving the project, do `library(powers)`, and use the function! Pretty cool, eh?
### Documentation
The `roxygen2` package makes documentation easy(er). Comment package functions with `#'` above the function, and use tags starting with `@`. Let's document the `square` function.
Key tags:
- `@param` -- what's the input?
- `@return` -- what's the output?
- `@export` -- make the function available upon loading the package.
Type `document()` into the console (a function from the `devtools` package). Then `Install and Restart` the package.
Your function is now documented. Check it out with `?square`! This happens due to the creation of an `Rd` file in the `man` folder.
### Taking control of your NAMESPACE
Let's start being intentional as to what appears in our NAMESPACE.
1. Delete your NAMESPACE file.
2. Add the `@export` tag to your `square` function to write it to the NAMESPACE.
Things that do not get `@export`ed can still be referred to "internally" by functions in your NAMESPACE, as we'll see soon.
### Checking
It's a good idea to `check` your package early and often to see that everything is working.
Click `Check` under the `Build` menu. It checks lots of things for you! We'll see more examples of this.
### Function Dependencies
Make another, more general function to compute any power:
```{r}
pow <- function(x, a=2) x^a
```
It can go in the same R script as `square`, or a different one -- your choice.
We'll make `square` depend on `pow`.
Aftering `Install and Restart`ing, you'll notice that you can't use `pow` because it's not `export`ed. But, `square` still works! We call `pow` an _internal_ function.
__Note__: you should still document your internal function! But mention that the function is internal. Users will be able to access the documentation like normal, but still won't be able to (easily) use the function.
If you want to be able to use internal functions as a developer, but don't want users to have (easy) access to the functions, then run `load_all` instead of `Install and Restart`.
### Your Turn
Make and document another function, say `cube`, that raises a vector to the power of 3. Be sure to `@export` it to the NAMESPACE. Use our internal `pow` function to make `cube`, if you have it.
Finished early? Do more -- work on Assignment 7, and/or try out more documentation features that comes with `roxygen2` (the `@` tags).
## Documentation and Testing
### More Roxygen2 Documentation
- `\code{}` for code font
- `\link{}` to link to other function docs
- Combine: `\code{\link{function_name}}`
Enumeration:
```
#' \enumerate{
#' \item first item
#' \item second item
#' }
```
Itemization:
```
#' \itemize{
#' \item first item
#' \item second item
#' }
```
Manually labelled list:
```
#' \describe{
#' \item{bullet label 1}{first item}
#' \item{bullet label 2}{second item}
#' }
```
### DESCRIPTION file
Every R package has this. It contains the package's metadata. Let's edit it:
- Add a title and brief description.
- R is picky about these! Check out [the rules](http://r-pkgs.had.co.nz/description.html#pkg-description).
- Add your name.
- Use the [`Authors@R` field](http://r-pkgs.had.co.nz/description.html#author) instead of the default `Author` and `Maintainer` fields.
- Pick a license: next!
### Pick a license
[Karl Broman's post](http://kbroman.org/pkg_primer/pages/licenses.html) is brief and informative.
Let's add an MIT licence.
### Testing with `testthat`
We've already seen package `Check`s -- this checks that the pieces of your R package are in place, and that even your examples don't throw errors. We should not only check that our functions are _working_, but that they give us results that we'd _expect_.
The `testthat` package is useful for this. Initialize it in your R package by running `use_testthat()`.
As a template, save and edit the following script in a file called `test_square` in the `tests/testthat` folder, filling in the blanks with an `expect` statement:
```
context("Squaring non-numerics")
test_that("At least numeric values work.", {
num_vec <- c(0, -4.6, 3.4)
expect_identical(square(numeric(0)), numeric(0))
FILL_THIS_IN
})
test_that("Logicals automatically convert to numeric.", {
logic_vec <- c(TRUE, TRUE, FALSE)
FILL_THIS_IN
})
```
Then, you can execute those tests by running `devtools::test()`, or clicking `Build` -> `Test package`.
These sanity checks are very important as your R package becomes more complex!
## Higher-level User Documentation
### Package Documentation
Just like we do for functions, we can make a manual (`.Rd`) page for our entire R package, too. For example, check out the documentation for `ggplot2`:
```
?ggplot2 # Can execute only if `ggplot2` is loaded.
package?ggplot2 # Always works.
```
To do so, just execute `use_package_doc()`. You'll see a new R script come up with `roxygen2`-style documentation to `NULL`. Document as you'd do functions, and run `document()` to generate the `.Rd` file.
Here's sample documentation:
```
#' Convenient Computation of Powers
#'
#' Are you tired of using the power operator, \code{^} or \code{**} in R?
#' Use this package to call functions that apply common powers
#' to your vectors.
#'
#' @name powers
#' @author Me
#' @note This package isn't actually meant to be serious. It's just for
#' teaching purposes.
#' @docType package
```
### Vignettes
It's a good idea to write a vignette (or several) for your R package to show how the package is useful as a whole. Documentation for individual functions don't suffice for this purpose!
To write a vignette called `"my_vignette"`, just run
```
use_vignette("my_vignette")
```
Some things happen automatically, but it's up to you to modify the `.Rmd` document to provide adequate instruction. Change the template to suit your package. The only real "catch" to doing this is _making sure the title is replaced in both instances_.
Then just `Knit`, and then run `build_vignettes()` to build the vignettes.
__Vignette woes__: There seems to be resistance against building vignettes when installing. Try running `install(build_vignettes=TRUE)` to get it working.
### README
Just as most projects should have a `README` file in the main directory, so should an R package.
__Purposes__:
- Inform someone stumbling across your project what they've stumbled across.
- At a high level (like "This is an R package"), but also
- somewhat at a lower level too, like your description file. This becomes a little redundant.
- I like to use the README to inform _developers_ the main workflow and spirit behind _developing_ the package.
- There are some things that you'd want other potential developers to know about the package as a whole, yet are irrelevant to users!
__How to do it__:
You could just make and edit a `README.md` file like normal. But you'll probably want to briefly demonstrate some code, so you'll need an `.Rmd`. Let `devtools` set that up for you:
```
use_readme_rmd()
```
`knit` and you're done!
### Exercises
Create the above three types of documentation, without looking at [my version](https://github.com/vincenzocoia/powers). Then compare.
Ideally, you'll have more to document because you've been working on expanding this (or another) R package for Homework 07 already.
## Adding data to your R package
You can store _and document_ datasets within R packages. Here's one useful way.
__Note__: This currently doesn't seem to be present in the companion tutorial from Jenny. Check out [the R Packages "data chapter"](http://r-pkgs.had.co.nz/data.html) for a resource.
__Example__:
Let's add `tenvec` and `tendf` to the package:
```
tenvec <- 1:10
tendf <- data.frame(vec=1:10)
```
In the console:
1. Store your data as R objects, as we've done above with `tenvec` and `tendf`.
2. Execute `use_data(tenvec, tendf)` (one argument per object).
`tenvec` and `tendf` will be saved as `.Rdata` files in the new `/data` directory. These are available upon loading the package.
To document the data, for each object (i.e., for each of `tenvec` and `tendf`), put `roxygen2`-style documentation above _the character_ `"tenvec"` and `"tendf"` in an R script in the `/R` folder.
Example for `tenvec`:
```
#' Integer vector from 1 to 10
#'
#' Self-explanatory!
#'
#' @format What format does you data take? Integer vector.
#' @source Where did the data come from?
"tenvec"
```
The `@format` and `@source` tags are unique to data documentation. Note that you shouldn't use the `@export` tag when documenting data!
## Dependencies
We can use functions from other R packages within our homemade R package, too. We need to do two things:
- Use the syntax `package_name::function_name()` whenever you want to use `function_name` from `package_name`.
- Indicate that your R package depends on `package_name` in the DESCRIPTION file by executing the command `use_package("package_name")`.
There are other methods, but this is the easiest.
__Example__: Add `ggplot2` dependency to plot the resulting computations. Do so by adding a plot to `pow` -- change `pow`'s guts to the following:
```
res <- x^p
if (showplot) {
p <- ggplot2::qplot(x, res)
print(p)
}
res
```
__Note 1__: Here's an example of the benefits of not having your functions do too much -- I only needed to change `pow` alone to get the changes to work for `square` and `cube`.
__Note 2__: It's probably better to use Base R's plotting here, so that your package is as stand-alone as possible. We use `ggplot2` for expository purposes.
## Launching your Package to GitHub
If I want to put an R package on GitHub, I typically just:
1. Click "New" in GitHub to make a new repo. Don't initialize with README.
2. Follow the instructions github provides, which involves two lines to execute in the terminal.
- Those two lines can be found [here](http://happygitwithr.com/existing-github-last.html#connect-to-github) in Jenny's Happy git book.
There is also the [`use_github()` way](http://stat545.com/packages06_foofactors-package.html#use_github()) -- although, to me, it seems overly complicated (perhaps there's an advantage I don't know about). It's just a matter of following the instructions, which are not worth demonstrating here.
## Time remaining?
If there's time remaining, we'll check out [S3 OO programming](http://adv-r.had.co.nz/OO-essentials.html#s3) in R.
1. Add a "class" to the output of `pow`.
2. Add some methods:
```
print.pow <- function(x) {
cat(paste("Object of class 'pow',", head(x)))
invisible()
}
#' @export
bind.pow <- function(x) paste(x, collapse=".")
bind <- function(x) UseMethod("bind")
```