dbplyr 1.2.0

We are very excited to announce that dbplyr 1.2.0 is now available on CRAN! dbplyr is the database backend for dplyr. It interacts with databases directly by translating dplyr code into SQL queries. dbplyr is part of an expanding ecosystem of packages meant to help analysts access and analyze data in databases. To learn more, visit our website dedicated to all things R and databases: http://db.rstudio.com/

Install the latest version of dbplyr with:


What’s new?

Here are some highlights of new features and fixes found in this release. To see the full details please refer to the release notes:

  • New custom translation for Microsoft Access and Teradata

  • Amazon Redshift connections now point to the PostgreSQL translation.

  • Adds support for two new database R packages. These new packages are fully DBI-compliant and tested with DBItest. We recommend to use these instead of older packages:
    1. RMariaDB, use in favor of RMySQL
    2. RPostgres, use in favor of RPostgreSQL
  • ROracle connections now point to the Oracle translation.
  • Cache query results using copy_to. This feature is very useful because after cached, the results of a long running query can be iteratively analyzed without having to wait for the same base query to run over and over.

  • stringr functions: str_length(), str_to_upper(), str_to_lower(), str_replace_all(), str_detect(), and str_trim() are now supported.

  • in_schema() should now work in more places, particularly in copy_to()

  • For those who need to extract the resulting SQL query from dplyr code, there is now a new function called remote_query(). Unlike show_query(), remote_query() returns a character object, and unlike sql_render(), the new function automatically detects the type of database attached to the given dplyr code and translates accordingly. There are four additional functions that provide a standard API to get metadata about a remote tbl, these are: remote_name(), remote_con(), remote_src(), and remote_query_plan().

  • Adds support for temporary tables in Microsoft SQL Server. Additionally, the issue of certain operators working only in mutate() or only in filter() has been resolved.

  • Two new functions for developers: sql_aggregate() and win_aggregate(). They are for generating SQL and windowed SQL functions for aggregates.

Caching results

The copy_to() function is now able to cache the results of a set of dplyr database transformations. An additional advantage is that copy_to() uses the database to run and cache the data, so there is no data being transmitted back to R. For this feature to work, the user needs sufficient rights to create temporary tables in the database.

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mtcars_db <- copy_to(con, rownames_to_column(mtcars), "mtcars")

only_auto <- mtcars_db %>%
  filter(am == 1) %>%
  select(mpg, cyl, am)

auto <- copy_to(con, only_auto, "auto") # New in dbplyr 1.2.0
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.19.3 [:memory:]
#>     mpg   cyl    am
#>   <dbl> <dbl> <dbl>
#> 1  21.0  6.00  1.00
#> 2  21.0  6.00  1.00
#> 3  22.8  4.00  1.00
#> 4  32.4  4.00  1.00
#> 5  30.4  4.00  1.00
#> 6  33.9  4.00  1.00

If the set of dplyr transformations need to be cached to a different database, copy_to() is smart enough to download the data into memory in R, and then uploads the data to the target database.

second_con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
second_auto <- copy_to(second_con, only_auto, "auto") # New in dbplyr 1.2.0
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.19.3 [:memory:]
#>     mpg   cyl    am
#>   <dbl> <dbl> <dbl>
#> 1  21.0  6.00  1.00
#> 2  21.0  6.00  1.00
#> 3  22.8  4.00  1.00
#> 4  32.4  4.00  1.00
#> 5  30.4  4.00  1.00
#> 6  33.9  4.00  1.00

stringr functions

Six stringr functions are now supported. Regular expression support varies from database to database, but most simple regular expressions should be ok. The functions are:

  1. str_length()
  2. str_to_upper()
  3. str_to_lower()
  4. str_replace_all()
  5. str_detect()
  6. str_trim()
tbl(con, "mtcars") %>%
  head() %>%
  select(rowname) %>%
    s1 = str_length(rowname),                 # New in dbplyr 1.2.0
    s2 = str_to_upper(rowname),               # New in dbplyr 1.2.0
    s3 = str_to_lower(rowname),               # New in dbplyr 1.2.0
    s4 = str_replace_all(rowname, "M", "X"),  # New in dbplyr 1.2.0
    s5 = str_detect(rowname, "M"),            # New in dbplyr 1.2.0
    s6 = str_trim(rowname)                    # New in dbplyr 1.2.0
#> # Source: lazy query [?? x 7]
#> # Database: sqlite 3.19.3 [:memory:]
#>   rowname              s1 s2                s3       s4         s5 s6
#>   <chr>             <int> <chr>             <chr>    <chr>   <int> <chr>
#> 1 Mazda RX4             9 MAZDA RX4         mazda r~ Xazda ~     0 Mazda ~
#> 2 Mazda RX4 Wag        13 MAZDA RX4 WAG     mazda r~ Xazda ~     0 Mazda ~
#> 3 Datsun 710           10 DATSUN 710        datsun ~ Datsun~     0 Datsun~
#> 4 Hornet 4 Drive       14 HORNET 4 DRIVE    hornet ~ Hornet~     0 Hornet~
#> 5 Hornet Sportabout    17 HORNET SPORTABOUT hornet ~ Hornet~     0 Hornet~
#> 6 Valiant               7 VALIANT           valiant  Valiant     0 Valiant


A big thanks goes to those who made this release possible by contributing code or documentation: @DavisVaughan, @baileych, @Hong-Revo, @cwarden, @zozlak, @denismaciel, @jonassundman, @wibeasley, @mungojam, @hoxo-m, @dpprdan,@jimhester, @krlmlr, and @javierluraschi. Also, a thank you to those who helped up improve this package by suggesting features and reporting issues: @drf5n, @larspijnappel, @lbakerIsazi, @mtmorgan, @refik, @sicarul, and @vitallish

Upcoming events
Atlanta, GA
Oct 14-15
You should take this workshop if you have experience programming in R and want to learn how to tackle larger scale problems. The class is taught by Hadley Wickham, Chief Scientist at RStudio.
Nov 18 - Nov 19
This two-day course will provide an overview of using R for supervised learning. The session will step through the process of building, visualizing, testing, and comparing models that are focused on prediction. The goal of the course is to provide a thorough workflow in R that can be used with many different regression or classification techniques. Case studies on real data will be used to illustrate the functionality and several different predictive models are illustrated. The class is taught by Max Kuhn.