dbplyr 2.3.0

  dbplyr, dplyr

  Hadley Wickham

We’re chuffed to announce the release of dbplyr 2.3.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.

You can install it from CRAN with:

install.packages("{package}")

This post will highlight some of the most important new features in 2.3.0: eliminating subqueries for many verb combinations, better errors, and a handful of new translations. As usual, this release comes with a large number of improvements to translations for individual backends and you can see the full list in the release notes

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

SQL optimisation

dbplyr now produces fewer subqueries resulting in shorter, more readable, and, in some cases, faster SQL. The following combinations of verbs no longer require subqueries:

Here are a couple of examples of queries that are now much more compact:

lf1 <- lazy_frame(x = 1, a = "a", .name = "lf1")
lf2 <- lazy_frame(x = 1, b = "b", .name = "lf2")
lf3 <- lazy_frame(x = 1, c = "c", .name = "lf3")

lf1 |> 
  left_join(lf2, by = "x") |> 
  left_join(lf3, by = "x") |> 
  select(b, c)
#> <SQL>
#> SELECT `b`, `c`
#> FROM `lf1`
#> LEFT JOIN `lf2`
#>   ON (`lf1`.`x` = `lf2`.`x`)
#> LEFT JOIN `lf3`
#>   ON (`lf1`.`x` = `lf3`.`x`)

lf1 |> 
  group_by(x) |> 
  summarise(a = mean(a, na.rm = TRUE), n = n()) |> 
  filter(n > 5)
#> <SQL>
#> SELECT `x`, AVG(`a`) AS `a`, COUNT(*) AS `n`
#> FROM `lf1`
#> GROUP BY `x`
#> HAVING (COUNT(*) > 5.0)

(As ususal in these blog posts, I’m using lazy_frame() to focus on the SQL generation, without having to set up a dummy database.)

Additionally, where possible, dbplyr now uses SELECT * after a join instead of explicitly selecting every column.

Improved errors

Variables that aren’t found in either the data or in the environment now produce an error:

lf <- lazy_frame(x = 1,y = 2)

lf |> mutate(x = z + 1)
#> Error in `mutate()`:
#> ! Problem while computing `x = z + 1`
#> Caused by error:
#> ! Object `z` not found.

(Previously they were silently translated to SQL variables.)

We’ve also generally reviewed the error messages to ensure they show more clearly where the error happened:

lf |> mutate(x = y %/% 1)
#> Error in `purrr::pmap()` at dbplyr/R/lazy-select-query.R:282:2:
#>  In index: 1.
#>  With name: x.
#> Caused by error in `y %/% 1`:
#> ! %/% is not available in this SQL variant

lf |> mutate(across(x:y, "a"))
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(x:y, "a")`
#> Caused by error in `across()`:
#> ! `.fns` must be a NULL, a function, formula, or list

New translations

stringr::str_like() (new in stringr 1.5.0) is translated to LIKE:

lf1 |> 
  filter(stringr::str_like(a, "abc"))
#> <SQL>
#> SELECT *
#> FROM `lf1`
#> WHERE (`a` LIKE 'abc')

dbplyr 2.3.0 is also supports features coming in dplyr 1.1.0:

  • The .by argument is supported as alternative to group_by().
  • Passing ... to across() is deprecated because the evaluation timing of ... is ambiguous.
  • New pick() and case_match() functions are translated.
  • case_when() now supports the .default argument.

This version does not support the new join_by() syntax, but we’re working on it, and we’ll release an update after dplyr 1.1.0 is out.

Acknowledgements

The vast majority of this release (particularly the SQL optimisations) are from Maximilian Girlich; thanks so much for your continued work on this package.

We’d also like to thank all 74 contributors who help in someway, whether it was filing issues or contributing code and documentation: @a4sberg, @ablack3, @akgold, @aleighbrown, @andreassoteriadesmoj, @apalacio9502, @baileych, @barnesparker, @bhuvanesh1707, @bkraft4257, @bobbymc0, @brian-law-rstudio, @bthe, @But2ene, @capitantyler, @carlganz, @cboettig, @chwpearse, @copernican, @DSLituiev, @ehudtr7, @eitsupi, @ejneer, @eutwt, @ewright-vcan, @fabkury, @fh-afrachioni, @fh-mthomson, @filipemsc, @gadenbuie, @gbouzill, @giocomai, @hadley, @hershelm, @iangow, @iMissile, @IndrajeetPatil, @j-wester, @Janlow, @jasonmhoule, @jensmassberg, @jmbarbone, @joe-rodd, @kongdd, @krlmlr, @lschneiderbauer, @machow, @mgarbuzov, @mgirlich, @MichaelChirico, @moodymudskipper, @multimeric, @namarkus, @noamross, @NZambranoc, @oriolarques, @overmar, @owenjonesuob, @p-schaefer, @rohitg33, @rowrowrowyourboat, @rsund, @samssann, @samterfa, @schradj, @scvail195, @slhck, @splaisan, @stephenashton-dhsc, @ThomasMorland, @thothal, @viswaduttp, @XoliloX, and @yuhenghuang.