tidyr 0.7.0

We are happy to announce that tidyr 0.7.0 is now available on CRAN. There are two big changes:

  • tidyr now supports tidy evaluation (or tidy eval for short). You can find an introduction to tidy eval in the programming with dplyr vignette.

  • tidyr uses the new tidyselect package as selection backend.

This will probably only affect your code in minor ways but helps improve consistency across the tidyverse. You can read about the complete set of changes at https://github.com/tidyverse/tidyr/releases/tag/v0.7.0.

Install the latest version of tidyr with:

install.packages("tidyr")

New selection rules

Erratum: The change in selection rules described in that article was reverted as it proved too disruptive. Please see the erratum article for more information on this.

Following the switch to tidyselect, selecting functions are now stricter in their arguments to avoid ambiguous cases. For example, take gather() and its ... argument. Consider the following code:

x <- 3
df <- tibble(w = 1, x = 2, y = 3)
gather(df, "variable", "value", 1:x)
#> # A tibble: 2 x 3
#>       y variable value
#>   <dbl>    <chr> <dbl>
#> 1     3        w     1
#> 2     3        x     2

Should it select the first three columns (using the x defined in the global environment), or should it select the first two columns (using the column named x)?

To solve this ambiguity, we now make a strict distinction between data and context expressions. A data expression is either a bare name or an expression of the form x:y or c(x, y). In a data expression, you can only refer to columns from the data frame. Everything else is a context expression in which you can only refer to objects that you have defined by assigning with <-.

In practice this means that you can no longer refer to contextual objects like this:

mtcars %>% gather(var, value, 1:ncol(mtcars))

x <- 3
mtcars %>% gather(var, value, 1:x)
mtcars %>% gather(var, value, -(1:x))

You now have to be explicit about where to find objects. One way of being explicit is to use the quasiquotation operator !! which will evaluate its argument early and inline the result:

mtcars %>% gather(var, value, !! 1:ncol(mtcars))
mtcars %>% gather(var, value, !! 1:x)
mtcars %>% gather(var, value, !! -(1:x))

Read more about quasiquotation in the tidy eval section.

Tidy evaluation

Tidy evaluation is a principled set of tools that allow programming with quoting functions (also called NSE functions) in a principled way. It was first introduced in dplyr 0.7.0 and you can learn more about it in the programming with dplyr vignette. At its core, tidy evaluation is the combination of two features: quasiquotation and quosures.

The tidy eval tools live in rlang and many of them are reexported in dplyr. This includes quo(), enquo() and quos(). In addition, rlang::expr(), rlang::sym() and rlang::syms() may be useful as well and will be exported in the next version of dplyr.

# Let's import some tidy eval tools that we'll use in examples below
library("dplyr")
sym <- rlang::sym

Quasiquotation is essential to program with quoting functions. It refers to the ability of unquoting part of a quoted expression, and makes it possible to program with the quoting grammars of dplyr and tidyr. With quasiquotation, you can change what a function “sees”. You’ll typically want to unquote a symbol representing a data frame column with the !! operator.

Here, expand() sees vs and cyl:

expand(mtcars, vs, cyl)
#> # A tibble: 6 x 2
#>      vs   cyl
#>   <dbl> <dbl>
#> 1     0     4
#> 2     0     6
#> 3     0     8
#> 4     1     4
#> 5     1     6
#> 6     1     8

Thanks to quasiquotation, we can change what expand() sees by unquoting the am symbol:

x <- sym("am")
expand(mtcars, vs, !! x)
#> # A tibble: 4 x 2
#>      vs    am
#>   <dbl> <dbl>
#> 1     0     0
#> 2     0     1
#> 3     1     0
#> 4     1     1

The second tidy eval feature are quosures. A quosure is a special kind of expression that evaluates in both the data context (so you can refer to data frame columns) and the original context of the expression (e.g. a function context, so you can refer to local variables created there).

# For nicer printing
iris <- tibble::as_tibble(iris)

# Let's create a quosure within a local context
quo <- local({
  prefix <- "Sepal"
  quo(starts_with(prefix))
})

prefix only exist in the local context but the quosure can safely refer to it

quo
#> <quosure: local>
#> ~starts_with(prefix)

In tidyr 0.7.0, all functions now support quosures:

gather(iris, key, value, !! quo)
#> # A tibble: 300 x 5
#>    Petal.Length Petal.Width Species          key value
#>           <dbl>       <dbl>  <fctr>        <chr> <dbl>
#>  1          1.4         0.2  setosa Sepal.Length   5.1
#>  2          1.4         0.2  setosa Sepal.Length   4.9
#>  3          1.3         0.2  setosa Sepal.Length   4.7
#>  4          1.5         0.2  setosa Sepal.Length   4.6
#>  5          1.4         0.2  setosa Sepal.Length   5.0
#>  6          1.7         0.4  setosa Sepal.Length   5.4
#>  7          1.4         0.3  setosa Sepal.Length   4.6
#>  8          1.5         0.2  setosa Sepal.Length   5.0
#>  9          1.4         0.2  setosa Sepal.Length   4.4
#> 10          1.5         0.1  setosa Sepal.Length   4.9
#> # ... with 290 more rows

Typically you’ll use quosures to create wrappers around tidyr functions. To this end you’ll need enquo() which does two things: it transforms your function to a dplyr-like quoting function; and it returns the quoted expression as a quosure. Creating a wrapper function is often a simple matter of enquosing and unquoting:

my_gather <- function(df, expr) {
  quo <- enquo(expr)
  tidyr::gather(df, key, value, !! quo)
}

Thanks to the enquosing, you can safely call your wrapper in local contexts (e.g. within a function) and refer to variables defined there:

local({
  prefix <- "Sepal"
  my_gather(iris, starts_with(prefix))
})
#> # A tibble: 300 x 5
#>    Petal.Length Petal.Width Species          key value
#>           <dbl>       <dbl>  <fctr>        <chr> <dbl>
#>  1          1.4         0.2  setosa Sepal.Length   5.1
#>  2          1.4         0.2  setosa Sepal.Length   4.9
#>  3          1.3         0.2  setosa Sepal.Length   4.7
#>  4          1.5         0.2  setosa Sepal.Length   4.6
#>  5          1.4         0.2  setosa Sepal.Length   5.0
#>  6          1.7         0.4  setosa Sepal.Length   5.4
#>  7          1.4         0.3  setosa Sepal.Length   4.6
#>  8          1.5         0.2  setosa Sepal.Length   5.0
#>  9          1.4         0.2  setosa Sepal.Length   4.4
#> 10          1.5         0.1  setosa Sepal.Length   4.9
#> # ... with 290 more rows

Special tidy evaluation rules

The tidy evaluation implementation of tidyr is a bit special. While the philosophy of tidy evaluation is that R code should refer to real objects (from the data frame or from the context), we had to make some exceptions to this rule for tidyr. The reason is that several functions accept bare symbols to specify the names of new columns to create (gather() being a prime example). This is not tidy because the symbol do not represent any actual object.

Our workaround is to capture these arguments using rlang::quo_name() (so they still support quasiquotation and you can unquote symbols or strings). This workaround is only provided for backward compatibility as this UI is not consistent with other tidyr functions that require strings (e.g. the into argument of separate()). More generally this type of NSE it is now discouraged in the tidyverse. We now prefer strings to refer to variables that don’t yet exist.

Breaking changes

  • The underscored SE variants are softly deprecated.

  • Selecting functions now make a distinction between data expressions and context expressions. The latter can refer only to contextual objects while the former can refer only to data variables. See above for more information.

Contents
Upcoming events
Washington, DC
Aug 23-24
This two-day course will provide an overview of using R for supervised learning. The session will step through the process of building, visualizing, testing, and comparing models that are focused on prediction.