# dplyr 1.0.0 for package developers

As you’re hopefully aware, dplyr 1.0.0 is coming soon, and we’ve been writing a series of blog posts about the user-facing changes that you, as a data scientist have to look forward to. Today, I wanted to change tack a little and talk about the changes from the perspective of the package developer.

Update: as of June 1, dplyr 1.0.0 is now available on CRAN! Read all about it or install it now with install.packages("dplyr").

But first, an update on the release process: in the process of preparing for this release, we discovered some subtle problems that arise when combining different types of data frames (including data.tables and tibbles). It took us a little while to figure out what we (and package developers need to do), so we’ve decided to push back the dplyr release: we’re now planning on releasing dplyr 1.0.0 to CRAN on May 15. We’re sorry that its going to longer than expected, but this gives package authors who use dplyr more time to handle changes.

In this post, I want to address how dplyr changes might break package code, then discuss some of the major pain points a package developer might experience, and how to get help if you need it.

library(dplyr, warn.conflicts = FALSE)


## Breaking changes

There are three main ways an update to a package might break your existing code:

• We’ve introduced a bug. Obviously, we do our best to make sure this doesn’t happen (by using software development best practices like unit testing and code review) but it’s impossible to eliminate all bugs.

• We’ve fixed a bug or otherwise made change we think is harmless. Sometimes your code accidentally depends on a behaviour that we think is incorrect and we change it. The change will be an improvement for most people, but unfortunately it breaks your code.

• We’ve deliberately made a backward incompatible interface change. We try to make these as rarely as possible, and only to significantly improve usability or consistency. Unless the package or function is experimental, we do our best to make such changes gradually, so that there’s a deprecation period before the behaviour goes away altogether.

dplyr 1.0.0 contains very few backward incompatible changes, but it does make a large number of changes that we believe are mostly harmless or minor improvements. The vast majority of these will not affect data analysis code, but some can affect packages, particularly through their unit tests. To give you a flavour for what I mean here, dplyr now preserves the names of atomic vectors:

df <- tibble(x = c(a = 1, b = 2))
#> expected$x: 2 1 # Column order is different expect_equal(tibble(x = 1, y = 2), tibble(y = 2, x = 1)) #> Error: actual (tibble(x = 1, y = 2)) not equal to expected (tibble(y = 2, x = 1)). #> #> names(actual): "x" "y" #> names(expected): "y" "x"  Fixing these failures will typically involve updating the expected value. (The problem of uninformative failures prompted me to start work on the waldo package that attempts to do better. You can try it out by installing the dev version of testthat, devtools::install_github("r-lib/testthat"), but note that it’s still experimental so it’s only recommended for the adventurous.) ## Increased strictness from vctrs As we discussed recently, dplyr now uses the vctrs package under the hood. This increased strictness affects a few edge cases. For example, in dplyr 0.8.5, the following code returned tibble(x = character()) (what we’d now consider to be a bug): df1 <- tibble(x = integer()) df2 <- tibble(x = character()) bind_rows(df1, df2) #> Error: Can't combine ..1$x <integer> and ..2\$x <character>.


If this affects your package, you’ll typically need to think about what the type of each column should be, and then ensure that’s the case everywhere in your code.

## Need help?

If you just can’t figure out how to fix your package, please let us know! The fastest way to get help is to file an issue containing a reprex that illustrates the precise problem. But if you’re struggling to make a reprex, you can give us a link to your repo, and we’ll take a look.

Contents