We’re thrilled to announce the release of dtplyr 1.3.0. dtplyr gives you the speed of data.table with the syntax of dplyr; you write dplyr (and tidyr) code and dtplyr translates it to the data.table equivalent.
You can install it from CRAN with:
This blog post will give you an overview of the changes in this version: dtplyr no longer adds translations directly to data.tables, it includes some dplyr 1.1.0 updates, and we have made some performance improvements. As always, you can see a full list of changes in the release notes
In previous versions, dtplyr registered translations that kicked in whenever you used a data.table. This
caused problems because merely loading dtplyr could cause otherwise ok code to fail because dplyr and tidyr functions would now return
lazy_dt objects instead of
data.table objects. To avoid this problem, we have removed those S3 methods so you must now explicitly opt-in to dtplyr translations by using
dt <- lazy_dt(data.frame(x = 1:10, id = 1:2)) dt |> summarise(mean = mean(x), .by = id) |> show_query() #> `_DT1`[, .(mean = mean(x)), keyby = .(id)] dt <- lazy_dt(data.frame(x = 1:10, y = runif(10))) dt |> mutate(row_sum = rowSums(pick(x))) |> show_query() #> copy(`_DT2`)[, `:=`(row_sum = rowSums(data.table(x = x)))]
Per-operation grouping was one of the dplyr 1.1.0 features inspired by data.table, so it’s neat to see it come full circle in this dtplyr release. Future releases will add support for other dplyr 1.1.0 features like the new
join_by() syntax and
dt |> add_count() |> show_query() #> copy(`_DT2`)[, `:=`(n = .N)] dt |> tidyr::unite("z", c(x, y)) |> show_query() #> copy(`_DT2`)[, `:=`(z = paste(x, y, sep = "_"))][, `:=`(c("x", #> "y"), NULL)] dt |> mutate(r = min_rank(x)) |> show_query() #> copy(`_DT2`)[, `:=`(r = frank(x, ties.method = "min", na.last = "keep"))] dt |> mutate(r = dense_rank(x)) |> show_query() #> copy(`_DT2`)[, `:=`(r = frank(x, ties.method = "dense", na.last = "keep"))]
This release also includes three translation improvements that yield better performance. When data has previously been copied
arrange() will use
setorder() instead of
select() will drop unwanted columns by reference (i.e. with
var := NULL). And
slice() now uses an intermediate variable to reduce computation time of row selection.
A massive thanks to Mark Fairbanks who did most of the work for this release, ably aided by the other dtplyr maintainers @eutwt and Maximilian Girlich. And thanks to everyone else who helped make this release possible, whether it was with code, documentation, or insightful comments: @abalter, @akaviaLab, @camnesia, @caparks2, @DavisVaughan, @eipi10, @hadley, @jmbarbone, @johnF-moore, @lschneiderbauer, and @NicChr.