dplyr 1.0.0 available now!

Hadley Wickham

I’m very excited to announce the ninth and final blog post in the dplyr 1.0.0 series: dplyr 1.0.0 is now available from CRAN! Install it by running:

install.packages("dplyr")

Then load it with:

library(dplyr)

New features

dplyr 1.0.0 is chock-a-block with new features; so many, in fact, that we can’t fit them all into one post. So if you want to learn more about what’s new, we recommend reading our existing series of posts:

  • Major lifecycle changes. This post focusses on the idea of the “function lifecycle” which helps you understand where functions in dplyr are going. Particularly important is the idea of a “superseded” function. A superseded function is not going away, but we no longer recommend using it in new code.

  • New summarise() features. In summarise(), a single summary expression can now create both multiple rows and multiple columns. This significantly increases its power and flexibility.

  • select(), rename(), and (new) relocate(). select() and rename() can now select by position, name, function of name, type, and any combination thereof. A new relocate() function makes it easy to change the position of columns.

  • Working across() columns. A new across() function makes it much easier to apply the same operation to multiple columns. It supersedes the _if(), _at(), and _all() function variants.

  • Working within rows. rowwise() has been renewed and revamped to make it easier to perform operations row-by-row. This makes it much easier to solve problems that previously required base::lapply(), purrr::map(), or friends.

  • The role of the vctrs package. dplyr now makes heavy use of vctrs behind the scenes. This brings with it greater consistency and (hopefully!) more useful error messages.

  • Last minute additions summarise() now allows you to control how its results are grouped, and there’s a new family of functions designed for modifying rows.

You can see the full list of changes in the release notes.

dplyr has a new logo thanks to the talented Allison Horst!

New dplyr logo 

(Stay tuned for details about how to get this sticker on to your laptop. We have some exciting news coming up!)

A small teaser

The best way to find out about all the cool new features dplyr has to offer is to read through the blog posts linked to above. But thanks to inspiration from Daniel Anderson here’s one example of fitting two different models by subgroup that shows off a bunch of cool features:

library(dplyr, warn.conflicts = FALSE)

models <- tibble::tribble(
  ~model_name,    ~ formula,
  "length-width", Sepal.Length ~ Petal.Width + Petal.Length,
  "interaction",  Sepal.Length ~ Petal.Width * Petal.Length
)

iris %>% 
  nest_by(Species) %>% 
  left_join(models, by = character()) %>% 
  rowwise(Species, model_name) %>% 
  mutate(model = list(lm(formula, data = data))) %>% 
  summarise(broom::glance(model))
#> `summarise()` regrouping output by 'Species', 'model_name' (override with `.groups` argument)
#> # A tibble: 6 x 13
#> # Groups:   Species, model_name [6]
#>   Species model_name r.squared adj.r.squared sigma statistic  p.value    df
#>   <fct>   <chr>          <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>
#> 1 setosa  length-wi…     0.112        0.0739 0.339      2.96 6.18e- 2     3
#> 2 setosa  interacti…     0.133        0.0760 0.339      2.34 8.54e- 2     4
#> 3 versic… length-wi…     0.574        0.556  0.344     31.7  1.92e- 9     3
#> 4 versic… interacti…     0.577        0.549  0.347     20.9  1.11e- 8     4
#> 5 virgin… length-wi…     0.747        0.736  0.327     69.3  9.50e-15     3
#> 6 virgin… interacti…     0.757        0.741  0.323     47.8  3.54e-14     4
#> # … with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>,
#> #   df.residual <int>

Note the use of:

  • The new nest_by(), which generates a nested data frame where each row represents one subgroup.

  • In left_join(), by = character() which now performs a Cartesian product, generating every combination of subgroup and model.

  • rowwise() and mutate() which fit a model to each row.

  • The newly powerful summarise() which summarises each model with the model fit statistics computed by broom::glance().

Acknowledgements

dplyr 1.0.0 has been one of the biggest projects that we, as a team, have ever tackled. Almost everyone in the tidyverse team has been involved in some capacity. Special thanks go to Romain François, who in his role as primary developer has been working on this release for over six months, and to Lionel Henry and Davis Vaughn for all their work on the vctrs package. Jim Hester’s work on running revdep checks in the cloud also made a big impact on our ability to understand failure modes.

A big thanks to all 137 members of the dplyr community who helped make this release possible by finding bugs, discussing issues, and writing code: @AdaemmerP, @adelarue, @ahernnelson, @alaataleb111, @antoine-sachet, @atusy, @Auld-Greg, @b-rodrigues, @batpigandme, @bedantaguru, @benjaminschlegel, @benjbuch, @bergsmat, @billdenney, @brianmsm, @bwiernik, @caldwellst, @cat-zeppelin, @chillywings, @clauswilke, @colearendt, @DanChaltiel, @danoreper, @danzafar, @davidbaniadam, @DavisVaughan, @dblodgett-usgs, @ddsjoberg, @deschen1, @dfrankow, @DiegoKoz, @dkahle, @DzimitryM, @earowang, @echasnovski, @edwindj, @elbersb, @elcega, @ericemc3, @espinielli, @FedericoConcas, @FlukeAndFeather, @GegznaV, @gergness, @ggrothendieck, @glennmschultz, @gowerc, @greg-minshall, @gregorp, @ha0ye, @hadley, @Harrison4192, @henry090, @hughjonesd, @ianmcook, @ismailmuller, @isteves, @its-gazza, @j450h1, @Jagadeeshkb, @jarauh, @jason-liu-cs, @jayqi, @JBGruber, @jemus42, @jennybc, @jflournoy, @jhuntergit, @JohannesNE, @jzadra, @karldw, @kassambara, @klin333, @knausb, @kriemo, @krispiepage, @krlmlr, @kvasilopoulos, @larry77, @leonawicz, @lionel-, @lorenzwalthert, @LudvigOlsen, @madlogos, @markdly, @markfairbanks, @meghapsimatrix, @meixiaba, @melissagwolf, @mgirlich, @Michael-Sheppard, @mikmart, @mine-cetinkaya-rundel, @mir-cat, @mjsmith037, @mlane3, @msberends, @msgoussi, @nefissakhd, @nick-youngblut, @nzbart, @pavel-shliaha, @pdbailey0, @pnacht, @ponnet, @r2evans, @ramnathv, @randy3k, @richardjtelford, @romainfrancois, @rorynolan, @ryanvoyack, @selesnow, @selin1st, @sewouter, @sfirke, @SimonDedman, @sjmgarnier, @smingerson, @stefanocoretta, @strengejacke, @tfkillian, @tilltnet, @tonyvibe, @topepo, @torockel, @trinker, @tungmilan, @tzakharko, @uasolo, @werkstattcodes, @wlandau, @xiaoa6435, @yiluheihei, @yutannihilation, @zenggyu, and @zkamvar.