rlang 0.3.0

Photo by Brandon Siu

Introduction

We’re happy to announce that rlang 0.3.0 in now on CRAN! rlang is of most interest to package developers and R programmers, as it is intended for people developing data science tools rather than data scientists. rlang implements a consistent API for working with base types, hosts the tidy evaluation framework, and offers tools for error reporting. This release provides major improvements for each of those themes.

Consult the changelog for the full list of changes, including many bug fixes. The rlang API is still maturing and a number of functions and arguments were deprecated or renamed. Check the lifecycle section for a summary of the API changes.

Tidy evaluation and tidy dots

Tidy evaluation is the framework that powers data-masking APIs like dplyr, tidyr, or ggplot2. Tidy dots is a related feature that allows you to use !!! in functions taking dots, among other things.

Referring to columns with .data

The main user-facing change is that subsetting the .data pronoun with [[ now behaves as if the index were implicitly unquoted. Concretely, this means that the index can no longer be confused with a data frame column. Subsetting .data is now always safe, even in functions:

suppressPackageStartupMessages(
  library("dplyr")
)

df <- tibble(var = 1:4, g = c(1, 1, 2, 2))
var <- "g"

# `df` contains `var` but the column doesn't count!
df %>% group_by(.data[[var]])
#> # A tibble: 4 x 2
#> # Groups:   g [2]
#>     var     g
#>   <int> <dbl>
#> 1     1     1
#> 2     2     1
#> 3     3     2
#> 4     4     2

New tidy dots options

Tidy dots refers to a set of features enabled in functions collecting dots. To enable tidy dots, use list2() instead of list:

fn <- function(...) list2(...)

With tidy dots, users can splice in lists of arguments:

x <- list(arg1 = "A", arg2 = "B")

fn(1, 2, !!!x, 3)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> $arg1
#> [1] "A"
#> 
#> $arg2
#> [1] "B"
#> 
#> [[5]]
#> [1] 3

They can unquote names:

nm <- "b"

fn(a = 1, !!nm := 2)
#> $a
#> [1] 1
#> 
#> $b
#> [1] 2

And trailing empty arguments are always ignored to make copy-pasting easier:

fn(
  foo = "foo",
  foo = "bar",
)
#> $foo
#> [1] "foo"
#> 
#> $foo
#> [1] "bar"

While list2() hard-codes these features, dots_list() gains several options to control how to collect dots:

  • .preserve_empty preserves empty arguments:
  list3 <- function(...) dots_list(..., .preserve_empty = TRUE)
  
  list3(a = 1, b = , c = 2)

#> $a
  #> 1 1
  #>
  #> $b
  #>
  #>
  #> $c
  #> 1 2
  

We are using this option in env_bind() and call_modify() to allow assigning explicit missing values (see ?missing_arg()):

  call <- quote(mean())
  call_modify(call, ... = , trim = )

#> mean(…, trim = )
  

  • .homonyms controls whether to keep all arguments that have the same name (the default), only the first or last of these, or throw an error:
  list3 <- function(...) dots_list(..., .homonyms = "last")
  
  list3(foo = 1, bar = 2, foo = 3, bar = 4, bar = 5)

#> $foo
  #> 1 3
  #>
  #> $bar
  #> 1 5
  

These options can be set in enquos() as well.

Error reporting

abort() extends base::stop() to make it easy to create error objects with custom class and metadata. With rlang 0.3.0, abort() automatically stores a backtrace in the error object and supports chaining errors.

Backtraces

Storing a backtrace in rlang errors makes it possible to post-process the call tree that lead to an error and simplify it substantially. Let’s define three functions calling each other, with tryCatch() and evalq() interspersed in order to create a complicated call tree:

f <- function() tryCatch(g(), warning = identity) # Try g()
g <- function() evalq(h())                        # Eval h()
h <- function() abort("Oh no!")                   # And fail!

When a function signals an error with abort(), the user is invited to call last_error():

f()
#> Error: Oh no!
#> Call `rlang::last_error()` to see a backtrace

Calling last_error() returns the last error object. The error prints with its backtrace:

last_error()
#> <error>
#> message: Oh no!
#> class:   `rlang_error`
#> backtrace:
#>  ─global::f()
#>  ─global::g()
#>  ─global::h()
#> Call `summary(rlang::last_error())` to see the full backtrace

The backtrace is simple and to the point because it is printed in a simplified form by default. If you’d like to see the full story (or include the full backtrace in a bug report), call summary() on the error object:

summary(last_error())
#> <error>
#> message: Oh no!
#> class:   `rlang_error`
#> fields:  `message`, `trace` and `parent`
#> backtrace:
#> 
#> └─global::f()
#>   ├─base::tryCatch(g(), warning = identity)
#>   │ └─base:::tryCatchList(expr, classes, parentenv, handlers)
#>   │   └─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>   │     └─base:::doTryCatch(return(expr), name, parentenv, handler)
#>   └─global::g()
#>     ├─base::evalq(h())
#>     │ └─base::evalq(h())
#>     └─global::h()

Each call is prepended with a namespace prefix1 to reveal the flow of control across package contexts.

Chained errors

Chaining errors is relevant when you’re calling low-level APIs such as web scraping, JSON parsing, etc. When these APIs encounter an error, they often fail with technical error messages. It is often a good idea to transform these developer-friendly error messages into something more meaningful and actionable for end users.

Several programming languages provide the ability of chaining errors for these situations. With chained errors, the low level and high level contexts are clearly separated in the error report. This makes the error more legible for the end user, without hiding the low level information that might be crucial for figuring out the problem.

Say we’re writing a function make_report() to create an automated report and we’re downloading a file as part of the process with fetch_csv(), which might be implemented in a package:

fetch_csv <- function(url) {
  suppressWarnings(
    read.csv(url(url))
  )
}

prepare_data <- function(url) {
  data <- fetch_csv(url)
  tibble::as_tibble(data)
}

make_report <- function(url) {
  data <- prepare_data(url)

  # We're not going to get there because all our attempts to download
  # a file are going to fail!
  ...
}

This function might fail in fetch_csv() because of connection issues:

make_report("https://rstats.edu/awesome-data.csv")
#> Error in open.connection(file, "rt"): cannot open the connection to 'https://rstats.edu/awesome-data.csv'

Chaining errors makes it possible to transform this low-level API error into a high level error, without losing any debugging information. There are two steps involved in error chaining: catch low level errors, and rethrow them with a high level message. Catching can be done with base::tryCatch() or rlang::with_handlers(). Both these functions take an error handler: a function of one argument which is passed an error object when an error occurs.

To chain an error, simply call abort() in the error handler, with a high level error message and the original error passed as the parent argument. Here we’re going to use with_handlers() because it supports the rlang syntax for lambda functions (also used in purrr), which makes it easy to write simple handlers:

prepare_data <- function(url) {
  data <- with_handlers(
    error = ~ abort("Can't download file!", parent = .),
    fetch_csv(url)
  )
  tibble::as_tibble(data)
}

make_report("https://rstats.edu/awesome-data.csv")
#> Error: Can't download file!
#> Parents:
#>  ─cannot open the connection to 'https://rstats.edu/awesome-data.csv'

The main error message is now the high level one. The low level message is still included in the output to avoid hiding precious debugging information. Errors can be chained multiple times and all the messages and all parent messages are included in the output. But note that only errors thrown with abort() contain a backtrace:

last_error()
#> <error>
#> message: Can't download file!
#> class:   `rlang_error`
#> backtrace:
#>  ─global::make_report("https://rstats.edu/awesome-data.csv")
#>  ─global::prepare_data(url)
#> <error: parent>
#> message: cannot open the connection to 'https://rstats.edu/awesome-data.csv'
#> class:   `simpleError`

For this reason, chaining errors is more effective with rlang errors than with errors thrown with stop() and the error report could be improved if fetch_csv() used abort() instead of thrown(). Fortunately it is easy to transform any error into an rlang error without changing any code!

Promoting base errors to rlang errors

rlang provides with_abort() to run code with base errors automatically promoted to rlang errors. Let’s wrap around fetch_csv() to run it in a with_abort context:

my_fetch_csv <- function(url) {
  with_abort(fetch_csv(url))
}

prepare_data <- function(url) {
  data <- with_handlers(
    error = ~ abort("Can't download file!", parent = .),
    my_fetch_csv(url)
  )
  tibble::as_tibble(data)
}

Our own function calls abort() and the foreign functions are called within a with_abort(). Let’s see how chained errors now look like:

make_report("https://rstats.edu/awesome-data.csv")
#> Error: Can't download file!
#> Parents:
#>  ─cannot open the connection to 'https://rstats.edu/awesome-data.csv'
#> Call `rlang::last_error()` to see a backtrace

The backtraces are automatically segmented between low level and high level contexts:

last_error()
#> <error>
#> message: Can't download file!
#> class:   `rlang_error`
#> backtrace:
#>  ─global::make_report("https://rstats.edu/awesome-data.csv")
#>  ─global::prepare_data(url)
#> <error: parent>
#> message: cannot open the connection to 'https://rstats.edu/awesome-data.csv'
#> class:   `rlang_error`
#> backtrace:
#>  ─global::my_fetch_csv(url)
#>  ─global::fetch_csv(url)
#>  ─utils::read.csv(url(url))
#>  ─utils::read.table(...)
#>  ─base::open.connection(file, "rt")
#> Call `summary(rlang::last_error())` to see the full backtrace
summary(last_error())
#> <error>
#> message: Can't download file!
#> class:   `rlang_error`
#> fields:  `message`, `trace` and `parent`
#> backtrace:
#> 
#> └─global::make_report("https://rstats.edu/awesome-data.csv")
#>   └─global::prepare_data(url)
#> <error: parent>
#> message: cannot open the connection to 'https://rstats.edu/awesome-data.csv'
#> class:   `rlang_error`
#> fields:  `message`, `trace`, `parent` and `error`
#> backtrace:
#> 
#> └─global::my_fetch_csv(url)
#>   ├─rlang::with_abort(fetch_csv(url))
#>   │ └─base::withCallingHandlers(...)
#>   └─global::fetch_csv(url)
#>     ├─base::suppressWarnings(read.csv(url(url)))
#>     │ └─base::withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))
#>     └─utils::read.csv(url(url))
#>       └─utils::read.table(...)
#>         ├─base::open(file, "rt")
#>         └─base::open.connection(file, "rt")

If you’d like to promote all errors to rlang errors at all time, you can try out this experimental option by adding this to your RProfile:

if (requireNamespace("rlang", quietly = TRUE)) {
  options(error = quote(rlang:::add_backtrace()))
}

Environments

The environment API gains two specialised print methods. env_print() prints information about the contents and the properties of environments. If you don’t specify an environment, it prints the current environment by default, here the global environment:

env_print()
#> <environment: global>
#> parent: <environment: package:bindrcpp>
#> bindings:
#>  * fn: <fn>
#>  * var: <chr>
#>  * x: <list>
#>  * fetch_csv: <fn>
#>  * my_fetch_csv: <fn>
#>  * list3: <fn>
#>  * nm: <chr>
#>  * .Random.seed: <int>
#>  * call: <language>
#>  * f: <fn>
#>  * g: <fn>
#>  * h: <fn>
#>  * make_report: <fn>
#>  * df: <tibble>
#>  * colourise_chunk: <fn>
#>  * prepare_data: <fn>

The global environment doesn’t have any fancy features. Let’s look at a package environment:

env_print(pkg_env("rlang"))
#> <environment: package:rlang> [L]
#> parent: <environment: package:stats>
#> bindings:
#>  * is_dbl_na: <lazy> [L]
#>  * coerce_class: <lazy> [L]
#>  * as_quosure: <lazy> [L]
#>  * as_quosures: <lazy> [L]
#>  * quo_get_env: <lazy> [L]
#>  * return_to: <lazy> [L]
#>  * env_binding_are_lazy: <fn> [L]
#>  * quo_is_call: <lazy> [L]
#>  * new_call: <fn> [L]
#>  * is_scoped: <lazy> [L]
#>  * set_names: <fn> [L]
#>  * expr_deparse: <lazy> [L]
#>  * `f_env<-`: <lazy> [L]
#>  * as_box_if: <lazy> [L]
#>  * `%|%`: <fn> [L]
#>  * lang_head: <lazy> [L]
#>  * ns_env: <fn> [L]
#>  * list_along: <lazy> [L]
#>  * parse_quo: <lazy> [L]
#>  * lang_tail: <lazy> [L]
#>    * ... with 438 more bindings

This environment contains all functions exported by rlang. Its header includes the [L] tag to indicate that the environment is locked: you can’t add or remove bindings from it. The same tag appears next to each binding to indicate that the bindings are locked and can’t be changed to point to a different object. Finally, note how the type of many bindings is <lazy>. That’s because packages are lazily loaded for performance reasons. Technically, the binding points to a promise that will eventually evaluate to the actual object, the first time it is accessed.

The second print method concerns lists of environments as returned by search_envs() and env_parents(). While base::search() returns the names of environments on the search path, search_envs() returns the corresponding list of environments:

search_envs()
#>  [[1]] $ <env: global>
#>  [[2]] $ <env: package:bindrcpp>
#>  [[3]] $ <env: package:dplyr>
#>  [[4]] $ <env: package:rlang>
#>  [[5]] $ <env: package:stats>
#>  [[6]] $ <env: package:graphics>
#>  [[7]] $ <env: package:grDevices>
#>  [[8]] $ <env: package:utils>
#>  [[9]] $ <env: package:datasets>
#> [[10]] $ <env: package:methods>
#> [[11]] $ <env: Autoloads>
#> [[12]] $ <env: package:base>

env_parents() returns all parents of a given environment. For search environments, the last parent of the list is the empty environment:

envs <- env_parents(pkg_env("utils"))
envs
#> [[1]] $ <env: package:datasets>
#> [[2]] $ <env: package:methods>
#> [[3]] $ <env: Autoloads>
#> [[4]] $ <env: package:base>
#> [[5]] $ <env: empty>

For all other environments, the last parent is either the empty environment or the global environment. Most of the time the global env is part of the ancestry because package namespaces inherit from the search path:

env_parents(ns_env("rlang"))
#> [[1]] $ <env: imports:rlang>
#> [[2]] $ <env: namespace:base>
#> [[3]] $ <env: global>

It is possible to construct environments insulated from the search path. We’ll use env() to create such an environment. Counting from rlang 0.3.0, you can pass a single unnamed environment to env() to specify a parent. The following creates a child of the base package:

e <- env(base_env(), foo = "bar")
env_parents(e)
#> [[1]] $ <env: package:base>
#> [[2]] $ <env: empty>

Here is how to create a grandchild of the empty environment:

e <- env(env(empty_env()))
env_parents(e)
#> [[1]]   <env: 0x7fb6ec084238>
#> [[2]] $ <env: empty>

We hope that these print methods make it easier to explore the structure and contents of R environments.

Acknowledgements

Thanks to all contributors!

@akbertram, @AndreMikulec, @andresimi, @billdenney, @BillDunlap, @cfhammill, @egnha, @grayskripko, @hadley, @IndrajeetPatil, @jimhester, @krlmlr, @marinsokol5, @md0u80c9, @mikmart, @move[bot], @NikNakk, @privefl, @romainfrancois, @wibeasley, @yutannihilation, and @zslajchrt


  1. Or global:: if the function is defined in the global workspace. [return]
Upcoming events
Austin, TX
Jan 15-18
rstudio::conf 2019 covers all things RStudio, including workshops to teach you the tidyverse, and talks to show you the latest and greatest features.
Austin, TX
Jan 19
Help the tidyverse team improve our code and documentation. First-time contributors are welcome.