-
Notifications
You must be signed in to change notification settings - Fork 3
/
version-stable-r-development.Rmd
237 lines (199 loc) · 10.6 KB
/
version-stable-r-development.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
# Version-stable R development with Docker
```{r setup-version-stable-r-development, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, eval = FALSE)
```
In the context of productive solutions, it is essential to have full control
over the codebase and environment to ensure reproducibility and stability of
the setup. In the case of R-based projects, this implies fixing and aligning the
version of R as well as package and system dependencies. In order to achieve
well-managed release pipelines, a key aspect is to guarantee full equivalence of
the development setup to (alternative) target productive stages.
This guide illustrates an approach to manage a version-stable R development
environment based on containerized solutions leveraging the
[Rocker project](https://www.rocker-project.org/), allowing the coexistence
of multiple dockerized development flavors, to match various target production
environments or projects.
> _The instructions in this chapter are for R >= 4.0.0.
Images for R <= 3.6.3 are defined in
[rocker-org/rocker-versioned](https://github.com/rocker-org/rocker-versioned),
but are no longer actively maintained._
## Version-stable deployments
When deploying R applications (e.g. a Shiny app) using Docker containers, it is
important to control versioning of R and packages for the sake of reproducibility
and stability of the deployments. For this reason,
[version-stable](https://github.com/rocker-org/rocker-versioned2) images are
provided as part of the [Rocker project](https://www.rocker-project.org/) and
used as a basis for deploying productive applications.
Each version-stable Rocker image has an associated _tag_ for all non-latest R
versions (e.g. `rocker/r-ver:4.4.1`). Besides being specific to the
corresponding version of R, each tag fixes the version of contributed packages
(by using as package repository the CRAN snapshot of the last day CRAN
distributed that R version as latest release). See
[wiki/Versions](https://github.com/rocker-org/rocker-versioned2/wiki/Versions).
If that R version is the latest, the CRAN date will not be set and the latest packages will always be installed.
The `Dockerfile` of a deployed application then defines a given version-stable
image tag to start `FROM`, e.g.
```dockerfile
FROM rocker/r-ver:4.4.1
```
See
[SmaRP/Dockerfile](https://github.com/miraisolutions/SmaRP/blob/feature/upgrade-latest-r/Dockerfile)
for an example.
<!-- NOTE: Not yet merged into master, to be replaced with https://github.com/miraisolutions/SmaRP/blob/master/Dockerfile -->
## Align local development and deployment environments
When developing and testing an app locally, it is important to ensure the
environment is aligned with the target deployment environment. This might imply
using e.g. multiple R and package versions for the local development of
different applications. This is not possible with the typical setup (especially on
Linux systems), where only one R version (the latest release) exists.
The idea is then to rely on the same version-stable rocker containers used for
the deployments, using a containerized versioned RStudio instance for the local
development. This is available through Rocker's [versioned
stack](https://www.rocker-project.org/images/#the-versioned-stack), so we could
use e.g. `rocker/rstudio:4.4.1`.
Note that the same version-stable instance of RStudio can be used across all
different projects for which such version is relevant. For this reason, a
sensible choice is to rely on `rocker/verse` images, which add tidyverse and
devtools to the stack. They also include R Markdown system
dependencies TinyTeX and pandoc, sparing the effort of the tedious extra
install.
### Running versioned RStudio instances
Assume we want to run a containerized versioned instance of RStudio for R 4.4.1,
possibly alongside instances for other versions of R.
First of all, we need to get the image from docker-hub:
```{bash pull}
docker pull rocker/verse:4.4.1
```
We then want to have a running instance on `localhost` (`127.0.0.1`), with the
following setup:
- No authentication required (local setup).
- Enable root by setting the environment variable `ROOT` to `TRUE`, so that e.g.
`sudo apt-get` can be used in RStudio.
- Use a version-specific port, e.g. `4000` for R 4.0.0, `4410` for R 4.4.1 and
so on, so that we can use `localhost` for concurrent R version instances.
We bind the port to localhost (`127.0.0.1:4410`), so it is only accessible locally
(see [the Rocker reference](https://rocker-project.org/images/versioned/rstudio.html#disable_auth)).
- The development code of all relevant projects should live outside the
container and be shared with it (and possibly multiple other containers), e.g. under
`~/workspace` on the host machine and `/home/rstudio/workspace` in
the container.
- For this to work w/o [permission
issues](https://rocker-project.org/images/versioned/rstudio.html#userid-and-groupid),
the container user (`rstudio`) must match the UID of the host user (`$UID`).
This has the effect of setting the ownership of `~/workspace` on the host machine to `$UID` if it is not already owned by that user.
- In order for the RStudio settings to persist if the container is recreated
(e.g. after pulling a new `rocker` image), we also use a shared volume (like
`~/.rstudio-config/4.4.1`) for the `/home/rstudio/.config/rstudio` directory, which is
version-specific in case of multiple R versions.
- If we want to use Meld via the [compareWith](https://github.com/miraisolutions/compareWith/) addins, we need to
- map the `DISPLAY` environment variable and volume `/tmp/.X11-unix`,
- add `DISPLAY` to `Renviron`,
- install Meld,
- install `dbus-x11`.
- Use a version-specific name for the container running the RStudio instance,
e.g. `rstudio_4.4.1`.
```{bash run}
R_VER=4.4.1
SHARED_DIR=workspace
mkdir -p $HOME/.rstudio-config/$R_VER
docker run -d --restart=always \
-p 127.0.0.1:$(echo $R_VER | sed 's/[.]//g')0:8787 \
-e DISABLE_AUTH=true \
-e ROOT=true \
-e USERID=$UID \
-e GROUPID=$GID \
-v $HOME/$SHARED_DIR:/home/rstudio/$SHARED_DIR \
-v $HOME/.rstudio-config/$R_VER:/home/rstudio/.config/rstudio \
-e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix:ro \
--name rstudio_$R_VER \
rocker/verse:$R_VER
# R and RStudio are not getting the DISPLAY environment variable
docker exec rstudio_$R_VER bash -c \
'echo "DISPLAY=${DISPLAY}" >> /usr/local/lib/R/etc/Renviron'
# Install Meld
docker exec rstudio_$R_VER bash -c \
'apt-get update && apt-get install -y --no-install-recommends meld dbus-x11'
```
If you are using `R_VER=4.4.1`, the running RStudio can then be accessed by visiting `http://localhost:4410/`.
You may find convenient to define a shell function for these steps:
```{bash run_rstudio_ver-def}
run_rstudio_ver() {
local R_VER=${1:?"you must supply the R version as first argument"}
local SHARED_DIR=${2:?"you must supply the shared directory as second argument"}
local RVER_IMAGE=${3:-"verse"}
local BASE_IMAGE=rocker/$RVER_IMAGE:$R_VER
local PORT=$(echo $R_VER | sed 's/[.]//g')0
local CONTAINER_NAME=rstudio_$R_VER
echo "Containerized version-stable RStudio for R "$R_VER\
"based on image "$BASE_IMAGE\
"with shared volume "$SHARED_DIR
docker pull $BASE_IMAGE &&
mkdir -p $HOME/.rstudio-config/$R_VER &&
docker run -d --restart=always \
-p 127.0.0.1:$PORT:8787 \
-e DISABLE_AUTH=true \
-e ROOT=true \
-e USERID=$UID \
-e GROUPID=$GID \
-v $HOME/$SHARED_DIR:/home/rstudio/$SHARED_DIR \
-v $HOME/.rstudio-config/$R_VER:/home/rstudio/.config/rstudio \
-e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix:ro \
--name $CONTAINER_NAME \
$BASE_IMAGE &&
# R and RStudio are not getting the DISPLAY environment variable
docker exec $CONTAINER_NAME bash -c \
'echo "DISPLAY=${DISPLAY}" >> /usr/local/lib/R/etc/Renviron' &&
# Install Meld
docker exec $CONTAINER_NAME bash -c \
'apt-get update && apt-get install -y --no-install-recommends meld dbus-x11' &&
echo "RStudio running in container "$CONTAINER_NAME" on port "$PORT &&
echo "visit http://localhost:"$PORT
}
```
which you can re-use as compact command for any R version:
```{bash run_rstudio_ver-use}
run_rstudio_ver 4.4.1 workspace
```
Note that `--restart=always` specifies that the container should stay up and restart
itself after stopping, e.g. upon machine reboot or docker upgrade, so that it is
always available. Still, you can explicitly stop the running container with
```{bash stop}
docker stop rstudio_4.4.1
```
Alternatively, you can omit `--restart=always` and explicitly start the
container whenever needed with
```{bash start}
docker start rstudio_4.4.1
```
Note that `start`/`stop` operations do not affect the persistence of any files
created in rstudio while the container is running.
However if the container is _removed_, files created outside of mounted volumes
do **not** persist (`docker rm`, see below).
This is why we use a mounted volume for the `~/.config/rstudio` directory.
### Using `podman` instead of `docker`
`podman` can be used instead of `docker` to run the above commands. In that case, the active user in the container will be _root_,
which is then mapped to the user that invoked the podman command when writing files to the shared volume.
Because of this, RStudio does not set the desired home directory as the initial working directory. To correct this, add
```
echo '{ "initial_working_directory": "/home/rstudio" }' > $HOME/.rstudio-config/$R_VER/rstudio-prefs.json &&
```
After `mkdir -p $HOME/.rstudio-config/$R_VER` in the `run_rstudio_ver` function.
If you run into issues, the discussion in this rocker [pull request](https://github.com/rocker-org/rocker-versioned2/pull/636)
about rootless container support may help.
### Best-supported R versions
This tutorial uses images based on the [rocker-versioned2](https://github.com/rocker-org/rocker-versioned2) repository.
We recommend using the latest patch version for each minor version - e.g. 4.0.5 for 4.0.x,
as these seem to be the most regularly updated images
(see e.g. [rocker/verse on docker hub](https://hub.docker.com/r/rocker/verse/tags?page=&page_size=&ordering=&name=4.0.)).
### Cleanup
```{bash cleanup}
docker rm $(docker stop rstudio_4.4.1)
```
## References
- [The Rocker Project](https://www.rocker-project.org/)
- [Shared Volumes](https://www.rocker-project.org/use/shared_volumes/)
- [Rocker Wiki](https://github.com/rocker-org/rocker/wiki)
- [Sharing files with host machine](https://github.com/rocker-org/rocker/wiki/Sharing-files-with-host-machine)
- [Rocker reference for verse and other images](https://rocker-project.org/images/versioned/rstudio.html) (in particular [how to use](https://rocker-project.org/images/versioned/rstudio.html#how-to-use))