I’ve been working on a project recently where we’ve been building a data
analysis pipeline that involves bits of R code and bits of Python. Since
the whole thing runs on Docker, on a secured server with no internet
access, it’s been illuminating seeing the different ways that Python and
R deal with packages/libraries and their dependencies. For manual
documentation, I found pip show
really handy, and so I wrote a little
R package that does the same thing (you can see it here:
www.github.com/RobertMyles/showpackage).
As luck would have it, I noticed that RStudio is just about to release
renv onto CRAN. While I’ve used some
of the package managers available for R, I never really got into the
habit of using them – I suppose I found them underwhelming or too much
hassle. On the other hand, renv reminds me of Node’s package management
in JavaScript projects – to the user, just a lockfile detailing what
packages are used and what versions, and very easy to maintain. renv is
designed this way and I can really see myself using it regularly.
Basically, you can use the R-projects style that is already easy to do
from RStudio. Either create a new project or associate a new project
with an existing folder. Imagine you have a project that will read data
from a database, calculate some summaries and produce plots, all for an
automated report. To do something like this, you’ll probably use some
tidyverse packages like dplyr and ggplot2, some document-making packages
like knitr and rmarkdown, and maybe something like implyr and DBI/odbc
for getting data. Maybe this will be a package for private use, and you
may have devtools, tinytest/testthat and covr installed.
Well, first you’ll obviously need to install renv. This is easy to do
with remotes::install_github("rstudio/renv")
(since it’s not on CRAN
yet). Once you’ve made an R-project from RStudio (File > New Project >
etc.), you can simply type renv::init()
. This will set up the
infrastructure that renv will use to keep track of the packages you use,
and create a private library for these. Depending on what you’re already
using in this project (maybe nothing apart from base R), renv will
install the packages it sees in your .R files into this private library.
Running renv::init()
will show you something like this in your R
console:
> renv::init()
* Discovering package dependencies ... Done!
* Copying packages into the cache ... * Querying repositories for available source packages ... Done!
The following package(s) will be added to the lockfile:
_
askpass [1.1]
assertthat [0.2.1]
backports [1.1.4]
BH [1.69.0-1]
brew [1.0-6]
callr [3.3.1]
checkmate [1.9.4]
cli [1.1.0]
...
* Lockfile written to '~/renv_proj/renv.lock'.
* Project '~/renv_proj' loaded. [renv 0.6.0-113]
Restarting R session...
* Project '~/renv_proj' loaded. [renv 0.6.0-113]
The packages that have been installed are recorded in a lockfile, which is a JSON file that looks like this (depending on what you’ve installed):
{
"renv": {
"Version": "0.6.0-113"
},
"R": {
"Version": "3.6.0",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cran.rstudio.com"
}
]
},
"Packages": {
"BH": {
"Package": "BH",
"Version": "1.69.0-1",
"Source": "CRAN",
"Hash": "0fde015f5153e51df44981da0767f522"
},
"R6": {
"Package": "R6",
"Version": "2.4.0",
"Source": "CRAN",
"Hash": "92b50d943a7c76c67918c1e1beb68627"
}
...
Any time you update a package or install a new one, you can call
renv::snapshot()
to record the current state of packages to the
lockfile (this can be done automatically with
options(renv.config.auto.snapshot = TRUE)
). If you need a previous
version of the lockfile, use renv::restore()
, which by default will
choose the previous snapshot.
Here’s an example of the console printout from installing knitr with renv:
> install.packages("knitr")
* Querying repositories for available source packages ... Done!
* Querying repositories for available binary packages ... Done!
Retrieving 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/knitr_1.24.tgz' ...
OK [downloaded 1.3 Mb in 2.3 secs]
Retrieving 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/highr_0.8.tgz' ...
OK [downloaded 40.2 Kb in 1.3 secs]
Retrieving 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/markdown_1.1.tgz' ...
OK [downloaded 195.2 Kb in 1.6 secs]
Retrieving 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/xfun_0.8.tgz' ...
OK [downloaded 165 Kb in 2 secs]
Installing highr [0.8] from CRAN ...
OK (installed binary)
Installing xfun [0.8] from CRAN ...
OK (installed binary)
Installing markdown [1.1] from CRAN ...
OK (installed binary)
Installing knitr [1.24] from CRAN ...
OK (installed binary)
* Lockfile written to '~/Documents/renv_proj/renv.lock'.
It’s pretty easy to see package dependencies with renv (not as easy as showpackage 😜). We can see what dependencies come from our .R files, for example, which is a nice feature:
library(dplyr)
library(stringr)
renv::dependencies(".") %>% filter(str_detect(Source, "\\.R")) %>% pull(Package)
# Finding R package dependencies ... Done!
# [1] "data.table" "dplyr" "utils" "datavalidator" "dplyr" "lubridate" "readr"
# [8] "strex" "datavalidator" "testthat"
All in all, renv is a nice, well-designed package that is easy to use. I’ve already used starting using it, and once it gets to CRAN, I’ll be using it ‘in production’. Nice job, RStudio 👍