dbplyr 2.5.0

  dbplyr

  Hadley Wickham

We’re most pleased to announce the release of dbplyr 2.5.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.

You can install it from CRAN with:

install.packages("dbplyr")

This post focuses on the biggest change in dbplyr 2.5.0: improved syntax for tables nested inside schema and catalogs. As usual, this release also contains a ton of minor improvements to SQL generation, and I’d highly recommend skimming the release notes to learn the details.

Referring to tables in a schema

Historically, dbplyr has provided a bewildering array of options to specify a table inside a schema inside a catalog:

con |> tbl(ident_q("catalog_name.schema_name.table_name"))
con |> tbl(sql("SELECT * FROM catalog_name.schema_name.table_name"))
con |> tbl(in_catalog("catalog_name", "schema_name", "table_name"))
con |> tbl(ident_q("catalog_name.schema_name"), "table_name")
con |> tbl(sql("catalog_name.schema_name"), "table_name")

You can also use DBI::Id(), whose syntax has also evolved over time:

con |> tbl(DBI::Id(database = "catalog_name", schema = "schema_name", table = "table_name"))
con |> tbl(DBI::Id(catalog = "catalog_name", schema = "schema_name", table = "table_name"))
con |> tbl(DBI::Id("catalog_name", "schema_name", "table_name"))

Many of these options were poorly supported (i.e. we would accidentally break them from time-to-time) and suffered from the lack of a holistic vision. This release aims to bring order to the chaos by providing a succinct new syntax for literal table identifiers: I(). This allows you to succinctly identify a table nested inside a schema or catalog:

con |> tbl(I("catalog_name.schema_name.table_name"))
con |> tbl(I("schema_name.table_name"))

I() is a base function, and you may be familiar with it from modelling, e.g. lm(y ~ x + I(y * z)). It performs a similar role for both dbplyr and modelling function: it tells the function to treat the argument as is, rather than quoting it in the case of dbplyr, or interpreting as an interaction in the case of lm().

I() is dbplyr’s preferred way of specifying nested table identifiers and we will eventually formally supersede and then one day deprecate the other options. However, because their usage is widespread, this process will be slow and gradual, and play out over multiple years; there’s no need to make changes now.

(If you’re the author of a dbplyr backend, you’ll can take advantage of this new syntax by using the dbplyr_table_path class. dbplyr now provides a few helper functions to make this easier.)

Acknowledgements

A big thanks to all 46 folks who helped to make this release possible with their thoughtful comments and code contributions! @aarizvi, @abalter, @andreassoteriadesmoj, @andrew-schulman, @apalacio9502, @carlinstarrs, @catalamarti, @chicotobi, @DavisVaughan, @dmenne, @edgararuiz, @edonnachie, @eitsupi, @ejneer, @erydit, @espinielli, @fh-afrachioni, @ghost, @godislobster, @gorcha, @hadley, @hild0146, @JakeHurlbut, @jarodmeng, @Jiefei-Wang, @joshbal, @kelseyroberts, @kmishra9, @krlmlr, @m-muecke, @maciekbanas, @marcusmunch, @mgarbuzov, @mgirlich, @misea, @MKatz-DHSC, @Mkranj, @multimeric, @nathanhaigh, @nilescbn, @talegari, @Tazinho, @thomashulst, @Thranholm, @tomshafer, and @wstvcg.