We’re thrilled to announce that dtplyr 1.2.0 is now on CRAN. dtplyr gives you the speed of data.table with the syntax of dplyr; you write dplyr (and tidyr) code and dtplyr translates it to the data.table equivalent.
You can install dtplyr from CRAN with:
I’ll discuss three major changes in this blog post:
- New authors
- New tidyr translations
- Improvements to join translations
There are also over 20 minor improvements to the quality of translations; you can see a full list in the release notes.
The biggest news in this release is the addition of three new authors: Mark Fairbanks, Maximilian Girlich, and Ryan Dickerson are now dtplyr authors in recognition of their significant and sustained contributions. In fact, they implemented the bulk of the improvements in this release!
dt <- lazy_dt(data.frame(x = c(NA, "x.y", "x.z", "y.z"))) dt %>% separate(x, c("A", "B"), sep = "\\.", remove = FALSE) %>% show_query() #> copy(`_DT1`)[, `:=`(c("A", "B"), tstrsplit(x, split = "\\."))] dt <- lazy_dt(data.frame(x = c(1, NA, NA, 2, NA))) dt %>% fill(x) %>% show_query() #> copy(`_DT2`)[, `:=`(x = nafill(x, "locf"))] dt %>% replace_na(list(x = 99)) %>% show_query() #> copy(`_DT2`)[, `:=`(x = fcoalesce(x, 99))] dt <- lazy_dt(relig_income) dt %>% pivot_longer(!religion, names_to = "income", values_to = "count") %>% show_query() #> melt(`_DT3`, measure.vars = c("<$10k", "$10-20k", "$20-30k", #> "$30-40k", "$40-50k", "$50-75k", "$75-100k", "$100-150k", ">150k", #> "Don't know/refused"), variable.name = "income", value.name = "count", #> variable.factor = FALSE)
dt1 <- lazy_dt(data.frame(x = 1:3)) dt2 <- lazy_dt(data.frame(x = 2:3, y = c("a", "b"))) dt1 %>% inner_join(dt2, by = "x") %>% show_query() #> `_DT4`[`_DT5`, on = .(x), nomatch = NULL, allow.cartesian = TRUE] dt1 %>% left_join(dt2, by = "x") %>% show_query() #> `_DT5`[`_DT4`, on = .(x), allow.cartesian = TRUE] dt2 %>% right_join(dt1, by = "x") %>% show_query() #> `_DT5`[`_DT4`, on = .(x), allow.cartesian = TRUE]
This can make the translation a little longer for simple joins, but it greatly simplifies the underlying code. This simplification has made it easier to more closely match dplyr behaviour for column order, handling named
by specifications, Cartesian joins with
by = character(), and managing duplicated variable names.
As always, tidyverse packages wouldn’t be possible with the community, so a big thanks goes out to all 35 folks who helped to make this release a reality: @akr-source, @batpigandme, @bguillod, @cgoo4, @chenx2018, @D-Se, @eutwt, @hadley, @jatherrien, @jdmoralva, @jennybc, @jtlandis, @kmishra9, @lutzgruber, @lutzgruber-quantco, @markfairbanks, @mgirlich, @mrcaseb, @nassuphis, @nigeljmckernan, @NZambranoc, @PMassicotte, @psads-git, @quid-agis, @romainfrancois, @roni-fultheim, @samlipworth, @sanjmeh, @sbashevkin, @StatsGary, @torema-ed, @verajosemanuel, @Waldi73, @wurli, and @yiugn.