magrittr 2.0 is coming soon

It is with unclouded composure that we announce the upcoming release of magrittr 2.0. magrittr is the package home to the %>% pipe operator written by Stefan Milton Bache and used throughout the tidyverse.

This last and likely final version of magrittr resolves the longstanding issues of overhead and backtrace footprint. It also makes the magrittr pipe more compatible with a native pipe that will probably be included in the next version of R.

This version of magrittr has been completely rewritten in C to give better backtraces and much improved performance. It also uses a different approach in order to support laziness. This enables new uses of the pipe, and ensures magrittr is as similar as possible to the future base pipe. Our analysis and testing suggests that the new version should be a drop-in replacement, but we’d really like you to try it out and give us some feedback before we submit to CRAN. You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("tidyverse/magrittr")

If you discover any issues, please let us know by posting issues on the Github repository of magrittr.

This blog post covers the three main changes in this new version of the magrittr pipe.

library(magrittr)

Update 2020-09-22: Some minor issues have arised from the first round of beta testing (see note on visibility and laziness below). We postponed the release to late October to give more time for potential further issues to surface before publication of magrittr 2.0 to CRAN.

Backtraces

The R implementation of the magrittr pipe was rather costly in terms of backtrace clutter. This made it difficult to debug errors with functions using the pipe:

foo <- function() bar()
bar <- function() 1 %>% identity() %>% baz()
baz <- function(x) rlang::abort("oh no")

foo()
#> Error: oh no

rlang::last_trace()
#> <error/rlang_error>
#> oh no
#> Backtrace:
#>      █
#>   1. └─global::foo()
#>   2.   └─global::bar()
#>   3.     └─1 %>% identity() %>% baz()
#>   4.       ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
#>   5.       └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
#>   6.         └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
#>   7.           └─`_fseq`(`_lhs`)
#>   8.             └─magrittr::freduce(value, `_function_list`)
#>   9.               ├─base::withVisible(function_list[[k]](value))
#>  10.               └─function_list[[k]](value)
#>  11.                 └─global::baz(.)

This clutter is now completely resolved:

foo()
#> Error: oh no

rlang::last_trace()
#> <error/rlang_error>
#> oh no
#> Backtrace:
#>     █
#>  1. ├─global::foo()
#>  2. │ └─global::bar()
#>  3. │   └─1 %>% identity() %>% baz()
#>  4. └─global::baz(.)

Speed

The pipe is now written in C to improve the performance. Here is a benchmark for the old R implementation:

f1 <- function(x) x
f2 <- function(x) x
f3 <- function(x) x
f4 <- function(x) x

bench::mark(
  `1` = NULL %>% f1(),
  `2` = NULL %>% f1() %>% f2(),
  `3` = NULL %>% f1() %>% f2() %>% f3(),
  `4` = NULL %>% f1() %>% f2() %>% f3() %>% f4(),
)
#>   expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 1           59.4µs  68.9µs    13648.      280B     59.1  6004    26
#> 2 2           82.6µs 101.6µs     9252.      280B     42.8  3894    18
#> 3 3          106.4µs 124.7µs     7693.      280B     18.8  3690     9
#> 4 4          130.9µs 156.1µs     6173.      280B     18.8  2956     9

The new implementation is less costly, especially with many pipe expressions:

bench::mark(
  `1` = NULL %>% f1(),
  `2` = NULL %>% f1() %>% f2(),
  `3` = NULL %>% f1() %>% f2() %>% f3(),
  `4` = NULL %>% f1() %>% f2() %>% f3() %>% f4(),
)
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 1            2.16µs   3.11µs   306145.        0B     61.2  9998     2
#> 2 2            2.68µs   3.85µs   246869.        0B     74.1  9997     3
#> 3 3            3.22µs   4.55µs   207548.        0B     83.1  9996     4
#> 4 4            3.88µs   5.25µs   180807.        0B     72.4  9996     4

We don’t generally except this to have much impact on typical data analysis code, but it might yield meaningful speed ups if you are using the pipe inside very tight loops.

Laziness

R core has expressed their interest in adding a native pipe in the next version of R and are working on an implementation¹. The main user-visible change in this release makes magrittr more compatible with the behaviour of the base pipe by evaluating the expressions lazily, only when needed.

ignore_arguments <- function(...) "value"

stop("foo") %>% ignore_arguments()

#> [1] "value"

This has subtle implications but should be backward compatible with existing pipelines that run without error. The main source of behaviour change is that some code that previously failed may stop failing if the latter part of the pipeline specifically handled the error.

Similarly, warnings that were previously issued might now be suppressed by a function you’re piping into. That’s because the following expressions are now almost completely equivalent:

# Piped
warning("foo") %>% suppressWarnings()

# Nested
suppressWarnings(warning("foo"))

Thanks to this change, you will now be able to pipe into testthat error expectations, for instance:

library(testthat) %>%
  suppressMessages()

{ 1 + "a" } %>%
  expect_error("non-numeric argument")

Note that one consequence of having a lazy pipe is that the whole pipeline will be shown on the call stack before any errors are thrown:

f1 <- function(x) x
f2 <- function(x) x
f3 <- function(x) x
f4 <- function(x) x

stop("oh no") %>% f1() %>% f2() %>% f3() %>% f4()
#> Error in f1(.) : oh no

rlang::last_trace()
#> <error/rlang_error>
#> oh no
#> Backtrace:
#>     █
#>  1. ├─stop("oh no") %>% f1() %>% f2() %>% f3() %>% f4()
#>  2. ├─global::f4(.)
#>  3. ├─global::f3(.)
#>  4. ├─global::f2(.)
#>  5. └─global::f1(.)

The last function of the pipeline is f4(), so that’s the first one to be run. It evaluates its argument which is provided by f3(), so that’s the second function pushed on the stack. And so on until f1() needs the result of stop("oh no") which causes an error.

Update 2020-09-22: Another issue caused by laziness is that if any function in a pipeline returns invisibly, than the whole pipeline returns invisibly as well.

1 %>% identity() %>% invisible()
1 %>% invisible() %>% identity()
1 %>% identity() %>% invisible() %>% identity()

This is consistent with the equivalent nested code. This behaviour can be worked around in two ways. You can force visibility by wrapping the pipeline in parentheses:

my_function <- function(x) {
  (x %>% invisible() %>% identity())
}

Or by assigning the result to a variable and return it:

my_function <- function(x) {
  out <- x %>% invisible() %>% identity()
  out
}

Towards a release

Though we have changed the behaviour of the pipe, there should be no impact on your user code. The laziness makes it possible to use the pipe in more situations but is not any stricter. It should only cause problems in very rare corner cases and these should be minor. To confirm our analysis, we ran reverse dependency checks for magrittr, purrr, tidyr, dplyr, and tidymodels. Only a dozen out of the 2800 packages were broken by the new implementation, and fixing them is generally easy (see the breaking changes section of the NEWS file).

Update 2020-09-22: The issue you’re most likely to encounter is that using return() inside { inside %>% is no longer supported. If you do this, you will see this error:

1 %>% {
  if (. > 0) {
    return(.)
  }
  . + 1
}

#> Error in 1 %>% {: no function to return from, jumping to top level

Despite these few corner cases, we are confident that this release should be seamless for the vast majority of users. But, to be extra sure, we’d be grateful for any additional testing on real-life scripts with this development version. Please let us know of any issues you find with this new version of the pipe, if any.

Finally, if you’re interested in the design tradeoffs involved in the creation of a pipe operator in R, see the tradeoffs vignette. Any comments about the choices we have made are welcome.

See Luke Tierney’s keynote at the useR! 2020 conference ↩︎