November/December in the tidyverse


  Mara Averick

In an effort to keep the community up to date with the evolution of the tidyverse, we’ll be doing regular roundups cataloging the latest developments.

tidyverse package updates


tidyselect provides a common back-end for dplyr::select(), tidyr::gather(), as well as for modelling packages. It is also the source of selection helpers, such as everything(), and starts_with(). tidyselect allows you to create selecting verbs that are consistent across tidyverse packages.

tidy models

yardstick allows you to easily create tidy performance estimates. Using a syntax similar to dplyr’s you can compute common performance metrics, such as precision, and recall (for classification), or numeric metric outcomes for regression, and have them returned in a tidy data frame.

recipes is an extensible framework for feature selection and the creation of preprocess design matrices, which can then be applied to statistical and machine learning models. The updated version of recipes includes a tidy method for many of the step functions. The tidy method returns relevant information about the step. This could include estimated parameters or which variables were affected by the step. NEWS

rsample’s major upgrade from caret is that it allows for nested resampling. The goal is to have a modular, extensible set of methods that can be used across R packages for traditional resampling techniques, and estimating model performance. rsample can be used to create objects containing resamples of the original data, allowing you to create a model and optimization parameters with placeholders for features to be defined later. The package website has examples for resampling time series, survival models, and neural networks using tensorflow. NEWS

tidyposterior is used to conduct Bayesian post hoc analyses of resampling results generated by models. It can be considered an upgraded version of caret::resample. Though it works natively with rsample, it can be used with any data frame of results.


While the tidyverse consists of highly-opinionated tools for data science; r-lib contains mostly-unopinionated infrastructure tools with fewer dependencies.

package updates