magrittr 2.0 is here!

It is with fiery joyousness that we announce the release of magrittr 2.0. magrittr is the package home to the %>% pipe operator written by Stefan Milton Bache and used throughout the tidyverse. This last and likely final version of magrittr has been completely rewritten in C to resolve the longstanding issues of overhead and backtrace footprint. It also uses a different approach to support laziness and make the magrittr pipe more compatible with the base pipe |> to be included in the next version of R.

This blog post covers the three main changes in this new version of the magrittr pipe and how to solve compatibility issues, should they arise. Our analysis and testing suggests that the new version should be a drop-in replacement in most cases. It is however possible that the lazy implementation causes issues with specific functions. You will find below some tips to fix these, which will also make your code compatible with |> in R 4.1.

Install the latest version of magrittr with:

install.packages("magrittr")

Attach magrittr to follow the examples:

library(magrittr)

Backtraces

The R implementation of the magrittr pipe was rather costly in terms of backtrace clutter. This made it difficult to debug errors with functions using the pipe:

foo <- function() bar()
bar <- function() 1 %>% identity() %>% baz()
baz <- function(x) rlang::abort("oh no")

foo()
#> Error: oh no

rlang::last_trace()
#> <error/rlang_error>
#> oh no
#> Backtrace:
#>      █
#>   1. └─global::foo()
#>   2.   └─global::bar()
#>   3.     └─1 %>% identity() %>% baz()
#>   4.       ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
#>   5.       └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
#>   6.         └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
#>   7.           └─`_fseq`(`_lhs`)
#>   8.             └─magrittr::freduce(value, `_function_list`)
#>   9.               ├─base::withVisible(function_list[[k]](value))
#>  10.               └─function_list[[k]](value)
#>  11.                 └─global::baz(.)

This clutter is now completely resolved:

foo()
#> Error: oh no

rlang::last_trace()
#> <error/rlang_error>
#> oh no
#> Backtrace:
#>     █
#>  1. ├─global::foo()
#>  2. │ └─global::bar()
#>  3. │   └─1 %>% identity() %>% baz()
#>  4. └─global::baz(.)

Speed

The pipe is now written in C to improve the performance. Here is a benchmark for the old R implementation:

f1 <- function(x) x
f2 <- function(x) x
f3 <- function(x) x
f4 <- function(x) x

bench::mark(
  `1` = NULL %>% f1(),
  `2` = NULL %>% f1() %>% f2(),
  `3` = NULL %>% f1() %>% f2() %>% f3(),
  `4` = NULL %>% f1() %>% f2() %>% f3() %>% f4(),
)
#>   expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 1           59.4µs  68.9µs    13648.      280B     59.1  6004    26
#> 2 2           82.6µs 101.6µs     9252.      280B     42.8  3894    18
#> 3 3          106.4µs 124.7µs     7693.      280B     18.8  3690     9
#> 4 4          130.9µs 156.1µs     6173.      280B     18.8  2956     9

The new implementation is less costly, especially with many pipe expressions:

bench::mark(
  `1` = NULL %>% f1(),
  `2` = NULL %>% f1() %>% f2(),
  `3` = NULL %>% f1() %>% f2() %>% f3(),
  `4` = NULL %>% f1() %>% f2() %>% f3() %>% f4(),
)
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 1            1.83µs   2.42µs   379343.        0B     75.9  9998     2
#> 2 2             2.3µs   2.79µs   255363.        0B      0   10000     0
#> 3 3            2.82µs   3.74µs   244980.        0B     24.5  9999     1
#> 4 4            3.32µs   4.37µs   217986.        0B     21.8  9999     1

We don’t generally except this to have much impact on typical data analysis code, but it might yield meaningful speed ups if you are using the pipe inside very tight loops.

Laziness

R core has expressed their interest in adding a native pipe in the next version of R and are working on an implementation¹. The main user-visible change in this release makes magrittr more compatible with the behaviour of the base pipe by evaluating the expressions lazily, only when needed.

ignore_arguments <- function(...) "value"

stop("foo") %>% ignore_arguments()
#> [1] "value"

This has subtle implications but should be backward compatible with existing pipelines that run without error. The main source of behaviour change is that some code that previously failed may stop failing if the latter part of the pipeline specifically handled the error.

Similarly, warnings that were previously issued might now be suppressed by a function you’re piping into. That’s because the following expressions are now almost completely equivalent:

# Piped
warning("foo") %>% suppressWarnings()

# Nested
suppressWarnings(warning("foo"))

Thanks to this change, you will now be able to pipe into testthat error expectations, for instance:

library(testthat) %>%
  suppressMessages()

{ 1 + "a" } %>%
  expect_error("non-numeric argument")

Note that one consequence of having a lazy pipe is that the whole pipeline will be shown on the call stack before any errors are thrown:

f1 <- function(x) x
f2 <- function(x) x
f3 <- function(x) x
f4 <- function(x) x

stop("oh no") %>% f1() %>% f2() %>% f3() %>% f4()
#> Error in f1(.) : oh no

rlang::last_trace()
#> <error/rlang_error>
#> oh no
#> Backtrace:
#>     █
#>  1. ├─stop("oh no") %>% f1() %>% f2() %>% f3() %>% f4()
#>  2. ├─global::f4(.)
#>  3. ├─global::f3(.)
#>  4. ├─global::f2(.)
#>  5. └─global::f1(.)

The last function of the pipeline is f4(), so that’s the first one to be run. It evaluates its argument which is provided by f3(), so that’s the second function pushed on the stack. And so on until f1() needs the result of stop("oh no") which causes an error.

Compatibility with magrittr 2.0

Though we have changed the behaviour of the pipe, there should be no impact on your user code. The laziness makes it possible to use the pipe in more situations but is not any stricter. It should only cause problems in very rare corner cases and these should be minor. To confirm our analysis, we ran reverse dependency checks for magrittr, purrr, tidyr, dplyr, and tidymodels. Only a dozen out of the 2800 packages were broken by the new implementation, and fixing them has generally been easy (see the breaking changes section of the NEWS file). In this section you will find a summary of the most common problems and how to fix them.

Using `return()` inside `{` blocks

The issue you’re most likely to encounter is that using return() inside { inside %>% is no longer supported. If you do this, you will see this error:

1 %>% {
  if (. >= 0) {
    return(.)
  }
  . + 1
}
#> Error in 1 %>% {: no function to return from, jumping to top level

In general, the behaviour of return() inside a pipeline was not clearly defined. Should it return from the enclosing function, from the current pipe expression, or from the whole pipeline? We believe returning from the current function would be the ideal behaviour but for technical reasons we can’t implement it this way.

The solution to these errors is to rewrite your pipeline:

1 %>% {
  if (. >= 0) {
    .
  } else {
    . + 1
  }
}
#> [1] 1

In this case, creating a named function will probably produce clearer code:

increment_negative <- function(x) {
  if (x >= 0) {
    x
  } else {
    x + 1
  }
}

1 %>% increment_negative()
#> [1] 1

Sequential evaluation

A pipeline is laid out as a series of sequential steps:

1 %>% add(1) %>% multiply_by(2)
#> [1] 4

The sequentiality may break down with a lazy implementation. The laziness of R means that function arguments are only evaluated when they are needed. If the function returns without touching the argument, it is never evaluated. In the example below, the user passes stop() to an ignored argument:

ignore <- function(x) NULL

ignore(stop("No error is thrown because `x` is not needed"))
#> NULL

Here is a pipeline where the arguments are not evaluated until the end:

f1 <- function(x) {
  cat("f1\n")
  x
}
f2 <- function(x) {
  cat("f2\n")
  x
}
f3 <- function(x) {
  cat("f3\n")
  x
}

1 %>% f1() %>% f2() %>% f3()
#> f3
#> f2
#> f1
#> [1] 1

Let’s rewrite the pipeline to its nested form to understand what is happening:

f3(f2(f1(1)))
#> f3
#> f2
#> f1
#> [1] 1

f3() runs first. Because it first calls cat() before touching its argument, this is what runs first. Then it returns its argument, triggering evaluation of f2(), and so on.

In general, out-of-order evaluation only matters when your function produces side effects, such as printing output. It is easy to ensure sequential evaluation by forcing evaluation of arguments early in your function:

f1 <- function(x) {
  force(x)
  cat("f1\n")
  x
}
f2 <- function(x) {
  force(x)
  cat("f2\n")
  x
}
f3 <- function(x) {
  force(x)
  cat("f3\n")
  x
}

This forces arguments to be evaluated in order:

1 %>% f1() %>% f2() %>% f3()
#> f1
#> f2
#> f3
#> [1] 1


f3(f2(f1(1)))
#> f1
#> f2
#> f3
#> [1] 1

Visibility

Another issue caused by laziness is that if any function in a pipeline returns invisibly, then the whole pipeline returns invisibly as well. All these calls return invisibly:

1 %>% identity() %>% invisible()

1 %>% invisible() %>% identity()

1 %>% identity() %>% invisible() %>% identity()

This is consistent with the equivalent nested code:

invisible(identity(1))

identity(invisible(1))

identity(invisible(identity(1)))

This behaviour can be worked around in two ways. You can force visibility by wrapping the pipeline in parentheses:

my_function <- function(x) {
  (x %>% invisible() %>% identity())
}

Or by assigning the result to a variable and return it:

my_function <- function(x) {
  out <- x %>% invisible() %>% identity()
  out
}

Conclusion

Despite these few corner cases, we are confident that this release should be seamless for the vast majority of users. It fixes longstanding issues of overhead and makes the behaviour of %>% interchangeable with the future |> pipe of base R. We will maintain magrittr on CRAN for the foreseeable future, making it possible to write pipelined code that is compatible with older versions of R. The long-term compatibility and the resolved overhead should make magrittr a good choice for writing pipelines in R packages. We also hope it will improve the experience of users until they switch to the base pipe. For all these reasons, we are very happy to bring this ultimate version of magrittr to CRAN.

Many thanks to all contributors over the years:

@adamroyjones, @ajschumacher, @allswellthatsmaxwell, @annytr, @aouazad, @ateucher, @bakaburg1, @balwierz, @batpigandme, @bdhumb, @behrica, @bfgray3, @bkmontgom, @bramtayl, @burchill, @burgerga, @casallas, @cathblatter, @cfhammill, @choisy, @ClaytonJY, @cstepper, @ctbrown, @danklotz, @DarwinAwardWinner, @davharris, @Deleetdk, @dirkschumacher, @DroiPlatform, @dustinvtran, @eddelbuettel, @egnha, @emankhalaf, @Enchufa2, @englianhu, @epipping, @fabiangehring, @franknarf1, @gaborcsardi, @gdkrmr, @gforge, @ghost, @gwerbin, @hackereye, @hadley, @hh1985, @HughParsonage, @HuwCampbell, @iago-pssjd, @imanuelcostigan, @jaredlander, @jarodmeng, @jcpetkovich, @jdnewmil, @jennybc, @jepusto, @jeremyhoughton, @jeroenjanssens, @jerryzhujian9, @jimhester, @JoshOBrien, @jread-usgs, @jroberayalas, @jzadra, @kbodwin, @kendonB, @kevinykuo, @klmr, @krlmlr, @leerssej, @lionel-, @lorenzwalthert, @MajoroMask, @Make42, @mhpedersen, @MichaelChirico, @MilesMcBain, @mitchelloharawild, @mmuurr, @moodymudskipper, @move[bot], @Mullefa, @nteetor, @odeleongt, @peterdesmet, @philchalmers, @pkq, @prosoitos, @r2evans, @restonslacker, @richierocks, @robertzk, @romainfrancois, @rossholmberg, @rozsoma, @rpruim, @rsaporta, @salim-b, @sbgraves237, @SimonHeuberger, @smbache, @stemangiola, @tonytonov, @trevorld, @triposorbust, @Vlek, @vnijs, @vsalmendra, @vspinu, @wabarr, @wch, @westonplatter, @wibeasley, @wlandau, @yeedle, @yutannihilation, @zeehio, and @zerweck.

See Luke Tierney’s keynote at the useR! 2020 conference ↩︎