Tidy eval now supports glue strings

  r-lib, tidyverse, package

  Lionel Henry

rlang 0.4.0 introduced the curly-curly {{ operator to simplify writing functions around tidyverse pipelines. The minor update 0.4.3 of rlang makes it possible to use { and {{ to create result names in tidyverse verbs taking pairs of names and expressions.

Install the latest version of rlang to make the new feature globally available throughout the tidyverse:

install.packages("rlang")

Tunnelling data-variables with curly-curly

With the {{ operator you can tunnel data-variables (i.e. columns from the data frames) through arg-variables (function arguments):

library(tidyverse)

mean_by <- function(data, by, var) {
  data %>%
    group_by({{ by }}) %>%
    summarise(avg = mean({{ var }}, na.rm = TRUE))
}

The tunnel makes it possible to supply variables from the data frame to your wrapper function:

iris %>% mean_by(Species, Sepal.Width)
#> # A tibble: 3 x 2
#>   Species      avg
#>   <fct>      <dbl>
#> 1 setosa      3.43
#> 2 versicolor  2.77
#> 3 virginica   2.97

Without a tunnel, the ambiguity between data-variables and arg-variables causes R to complain about objects not found:

mean_by_no_tunnel <- function(data, by, var) {
  data %>%
    group_by(by) %>%
    summarise(avg = mean(var, na.rm = TRUE))
}

iris %>% mean_by_no_tunnel(Species, Sepal.Width)
#> Error: Must group by variables found in `.data`
#> * Column `by` is not found

That’s because of the ambiguity between the function argument by and the data-variable Species. R has no way of knowing that you meant the variable from the data frame.

Custom result names

In the example above, the result name is hard-coded to avg. This is an informative generic name, but returning a more specific name that reflects the context might make the function more helpful. For this reason, tidy eval functions taking dots (like dplyr::mutate(), dplyr::group_by(), or dplyr::summarise()) now support glue strings as result names.

Glue strings are implemented in the glue package. They are a flexible way of composing a string from components, interpolating R code within the string:

library(glue)
#> 
#> Attaching package: 'glue'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse

name <- "Bianca"
glue("The result of `1 + 2` is {1 + 2}, so says {name}.")
#> The result of `1 + 2` is 3, so says Bianca.

You can now use glue strings in result names. Note that for technical reasons you need the Walrus operator := instead of the usual =.

suffix <- "foo"
iris %>% summarise("prefix_{suffix}" := mean(Sepal.Width))
#>   prefix_foo
#> 1   3.057333

In addition to normal glue interpolation with {, you can also tunnel data-variables through function arguments with {{ inside the string:

mean_by <- function(data, by, var) {
  data %>%
    group_by({{ by }}) %>%
    summarise("{{ var }}" := mean({{ var }}, na.rm = TRUE))
}

iris %>% mean_by(Species, Sepal.Width)
#> # A tibble: 3 x 2
#>   Species    Sepal.Width
#>   <fct>            <dbl>
#> 1 setosa            3.43
#> 2 versicolor        2.77
#> 3 virginica         2.97

And you can combine both forms of interpolation in a same glue string:

mean_by <- function(data, by, var, prefix = "avg") {
  data %>%
    group_by({{ by }}) %>%
    summarise("{prefix}_{{ var }}" := mean({{ var }}, na.rm = TRUE))
}

iris %>% mean_by(Species, Sepal.Width)
#> # A tibble: 3 x 2
#>   Species    avg_Sepal.Width
#>   <fct>                <dbl>
#> 1 setosa                3.43
#> 2 versicolor            2.77
#> 3 virginica             2.97

You can learn more about tunnelling variables in this RStudio::conf 2020 talk.

Acknowledgements

Read about other bugfixes and features from the 0.4.3 release in the changelog. Many thanks to all the contributors for this release!

@chendaniely, @clauswilke, @DavisVaughan, @enoshliang, @hadley, @ianmcook, @jennybc, @krlmlr, @lionel-, @moodymudskipper, @neelan29, @nick-youngblut, @nteetor, @romainfrancois, @TylerGrantSmith, @vspinu, and @yutannihilation