Taking RStudio's renv for a spin

2019-08-17

I’ve been working on a project recently where we’ve been building a data analysis pipeline that involves bits of R code and bits of Python. Since the whole thing runs on Docker, on a secured server with no internet access, it’s been illuminating seeing the different ways that Python and R deal with packages/libraries and their dependencies. For manual documentation, I found pip show really handy, and so I wrote a little R package that does the same thing (you can see it here: www.github.com/RobertMyles/showpackage).
As luck would have it, I noticed that RStudio is just about to release renv onto CRAN. While I’ve used some of the package managers available for R, I never really got into the habit of using them – I suppose I found them underwhelming or too much hassle. On the other hand, renv reminds me of Node’s package management in JavaScript projects – to the user, just a lockfile detailing what packages are used and what versions, and very easy to maintain. renv is designed this way and I can really see myself using it regularly.

So how does renv work?

Basically, you can use the R-projects style that is already easy to do from RStudio. Either create a new project or associate a new project with an existing folder. Imagine you have a project that will read data from a database, calculate some summaries and produce plots, all for an automated report. To do something like this, you’ll probably use some tidyverse packages like dplyr and ggplot2, some document-making packages like knitr and rmarkdown, and maybe something like implyr and DBI/odbc for getting data. Maybe this will be a package for private use, and you may have devtools, tinytest/testthat and covr installed.
Well, first you’ll obviously need to install renv. This is easy to do with remotes::install_github("rstudio/renv") (since it’s not on CRAN yet). Once you’ve made an R-project from RStudio (File > New Project > etc.), you can simply type renv::init(). This will set up the infrastructure that renv will use to keep track of the packages you use, and create a private library for these. Depending on what you’re already using in this project (maybe nothing apart from base R), renv will install the packages it sees in your .R files into this private library.

Running renv::init() will show you something like this in your R console:

> renv::init()
* Discovering package dependencies ... Done!
* Copying packages into the cache ... * Querying repositories for available source packages ... Done!

The following package(s) will be added to the lockfile:
                _
  askpass         [1.1]
  assertthat      [0.2.1]
  backports       [1.1.4]
  BH              [1.69.0-1]
  brew            [1.0-6]
  callr           [3.3.1]
  checkmate       [1.9.4]
  cli             [1.1.0]

  ...

* Lockfile written to '~/renv_proj/renv.lock'.
* Project '~/renv_proj' loaded. [renv 0.6.0-113]

Restarting R session...

* Project '~/renv_proj' loaded. [renv 0.6.0-113]

The packages that have been installed are recorded in a lockfile, which is a JSON file that looks like this (depending on what you’ve installed):

{
  "renv": {
    "Version": "0.6.0-113"
  },
  "R": {
    "Version": "3.6.0",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cran.rstudio.com"
      }
    ]
  },
  "Packages": {
    "BH": {
      "Package": "BH",
      "Version": "1.69.0-1",
      "Source": "CRAN",
      "Hash": "0fde015f5153e51df44981da0767f522"
    },
    "R6": {
      "Package": "R6",
      "Version": "2.4.0",
      "Source": "CRAN",
      "Hash": "92b50d943a7c76c67918c1e1beb68627"
    }
    ...

Any time you update a package or install a new one, you can call renv::snapshot() to record the current state of packages to the lockfile (this can be done automatically with options(renv.config.auto.snapshot = TRUE)). If you need a previous version of the lockfile, use renv::restore(), which by default will choose the previous snapshot.

Here’s an example of the console printout from installing knitr with renv:

> install.packages("knitr")
* Querying repositories for available source packages ... Done!
* Querying repositories for available binary packages ... Done!
Retrieving 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/knitr_1.24.tgz' ...
    OK [downloaded 1.3 Mb in 2.3 secs]
Retrieving 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/highr_0.8.tgz' ...
    OK [downloaded 40.2 Kb in 1.3 secs]
Retrieving 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/markdown_1.1.tgz' ...
    OK [downloaded 195.2 Kb in 1.6 secs]
Retrieving 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/xfun_0.8.tgz' ...
    OK [downloaded 165 Kb in 2 secs]
Installing highr [0.8] from CRAN ...
    OK (installed binary)
Installing xfun [0.8] from CRAN ...
    OK (installed binary)
Installing markdown [1.1] from CRAN ...
    OK (installed binary)
Installing knitr [1.24] from CRAN ...
    OK (installed binary)
* Lockfile written to '~/Documents/renv_proj/renv.lock'.

It’s pretty easy to see package dependencies with renv (not as easy as showpackage 😜). We can see what dependencies come from our .R files, for example, which is a nice feature:

library(dplyr)
library(stringr)

renv::dependencies(".") %>% filter(str_detect(Source, "\\.R")) %>% pull(Package)
# Finding R package dependencies ... Done!
#  [1] "data.table"    "dplyr"         "utils"         "datavalidator" "dplyr"         "lubridate"     "readr"
#  [8] "strex"         "datavalidator" "testthat"

All in all, renv is a nice, well-designed package that is easy to use. I’ve already used starting using it, and once it gets to CRAN, I’ll be using it ‘in production’. Nice job, RStudio 👍