I am delighted to announced that broom 0.5.0 is now available on CRAN. broom 0.5.0 is a major new release featuring changes that affect both users and developers. See the News for a detailed list of changes.
This release was possible due to RStudio’s internship program, which has enabled me (Alex Hayes) to act as broom’s maintainer for the course of the summer. David Robinson continues to steer design decisions. Many thanks to both RStudio and Dave for this opportunity.
All tidiers should now return
tibbles rather than
data.frames. This allows broom to take advantage of the nice tibble print method and the more consistent behavior of tibbles:
library(broom) fit <- lm(mpg ~ ., mtcars) tidy(fit)
## # A tibble: 11 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 12.3 18.7 0.657 0.518 ## 2 cyl -0.111 1.05 -0.107 0.916 ## 3 disp 0.0133 0.0179 0.747 0.463 ## 4 hp -0.0215 0.0218 -0.987 0.335 ## 5 drat 0.787 1.64 0.481 0.635 ## # ... with 6 more rows
These changes will mostly likely affect you when you:
- subset with
[, which always returns a tibble.
- set rownames on a tibble, which is deprecated.
- use augment methods on models with matrix covariates specified in a formula, which will error.
augment() will error with matrix covariates because tibbles are more strict about their contents than data frames. More details are available below.
Deprecated tidiers still return data frames. Tidiers for mixed models also return data frames.
broom 0.5.0 introduces tidiers for:
lavaanobjects from the
ivregobjects from the
Kendallobjects from the
garchobjects from the
irlbalists from the
durbinWatsonTestobjects from the
confusionMatrixobjects from the
cv.glmnetobjects from the
clmmobjects from the
svyolrobjects from the
polrobjects from the
In addition to these new tidiers, this release includes fixes for a large number of bugs in existing tidiers.
New test suite
We are heavily invested in making it easier to contribute to broom, and also in making broom behavior more standardized and consistent. To this end, we’ve written new testing infrastructure. At the moment, the new tests mostly ensure tibble output. For example,
tidy() output should now pass the following test:
td <- tidy(model) check_tidy_output(td)
Similar tests exist for
augment(). Stricter versions of these tests are under development for future releases.
Mixed models are moving to
As broom’s popularity has grown, broom has grown to encompass a fairly broad range of models. Dave and I have little to no experience with many of these models, and while we can fix bugs in the tidying code, we are no longer able to determine what constitutes a reasonable summary for many of these models.
Our intended solution is to split broom into several packages for tidying model objects. broom will provide tidiers for popular models (and those in
stats), and then domain experts will manage domain-specific tidying packages. Currently we’re working on a spec for all of these sub-packages to implement. With any luck this we’ll have a well-written spec to accompany the next release. We’d like all of the domain-specific tidying packages to eventually live in tidymodels, so that users can load a bunch of tools all at once with
library(tidymodels). tidymodels will act as meta-package that gathers together tidyverse compatible tools for modelling. Max Kuhn has migrated a number of his packages to the tidymodels organization, and we plan to move broom in the near future.
Mixed-model tidiers have long been a bit of a mess in broom. A while back
broom.mixed forked off to clean them up. broom.mixed is now a pilot for the larger project of splitting broom into domain specific tidying packages. We anticipate that broom.mixed will makes its way onto CRAN in the next several weeks, which will allow us to deprecate mixed model tidiers in broom 0.7.0. Although these models are not yet deprecated, there is currently no ongoing development work for them. In particular, the tidiers for:
- lme, lme4 and nmle models,
- brms models,
- rstanarm models, and
- mcmc objects
are one release away from deprecation, and effectively frozen.
New suggested workflow
When working with many models at the same time, we now recommend using list-columns and a
nest()-map()-unnest() workflow. This mirrors similar moves across the rest of the tidyverse. We have updated the kmeans, broom and dplyr and bootstrapping vignettes to reflect the new workflow. Additional, we’ve updated the bootstrapping vignette to use rsample rather than the now-deprecated
bootstrap() function. We no longer recommend the older
New vignettes and documentation
The list of available tidiers has been moved out of the README and into the Available Methods vignette.
We also have two new vignettes that are strictly works in progress at the moment. The first covers Adding New Tidiers and seeks to make the barrier of entry for broom contributions as low as possible. The second contains a Glossary of terms we are developing for use in an upcoming release of broom. This glossary will standardize argument names across tidiers, and column names across tidy output.
We have also migrated to a new template-based documentation strategy. Repeated documentation material now lives in
roxygen2 templates and can easily be added to a new tidy method. For an example of how this works, see
inflate()has been removed from
- Matrix and vector tidiers have been deprecated in favor of
- Dataframe tidiers and rowwise dataframe tidiers have been deprecated
bootstrap()has been deprecated in favor of the
The following functions will be deprecated in the next release of broom:
sptidying methods (in favor of
tidy.summaryDefault()(in favor of
tidy.table()(in favor of
- Mixed model and bayesian tidiers
Max Kuhn provided advice on dealing with model objects. Mara Averick provided feedback on drafts of this post.
An additional 38 fantastic contributors offered thoughtful comments on design, wrote bug reports and created PRs. The broom community has been kind, supportive and insighftul and I look forward to working you all again!
@atyre2, @batpigandme, @bfgray3, @bmannakee, @briatte, @cawoodjm, @cimentadaj, @dan87134, @dmenne, @ekatko1, @ellessenne, @erleholgersen, @Hong-Revo, @huftis, @IndrajeetPatil, @jacob-long, @jarvisc1, @jenzopr, @jgabry, @jimhester, @josue-rodriguez, @karldw, @kfeilich, @larmarange, @lboller, @mariusbarth, @michaelweylandt, @mine-cetinkaya-rundel, @mkuehn10, @mvevans89, @nutterb, @ShreyasSingh, @stephlocke, @strengejacke, @topepo, @willbowditch, @WillemSleegers, and @wilsonfreitas
Additional details on tibbles and
Data frames allow users to specify columns in a matrix, like so:
y <- rnorm(5) x <- matrix(rnorm(10), nrow = 5) df <- data.frame(x, y)
Tibbles do not:
library(tibble) tibble(x, y)
## Error: Column `x` must be a 1d atomic vector or a list
Modelling functions will occasionally create a data frame like this, but since the model frame can’t be coerced a tibble method,
augment() will fail:
fit <- lm(y ~ x, df) augment(fit)
## Error: Column `x` must be a 1d atomic vector or a list
In some cases, explicitly passing the original dataset via the
data argument can resolve this:
augment(fit, data = df)
## # A tibble: 5 x 10 ## X1 X2 y .fitted .se.fit .resid .hat .sigma .cooksd ## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.617 -0.720 -0.167 0.173 0.341 -0.340 0.661 0.108 1.26 ## 2 -0.164 0.943 -0.389 -0.158 0.291 -0.232 0.480 0.499 0.181 ## 3 -0.434 0.424 -0.339 -0.529 0.390 0.190 0.863 0.300 3.12 ## 4 0.231 0.696 0.148 0.150 0.310 -0.00177 0.545 0.594 0.0000156 ## 5 0.663 -0.274 0.704 0.320 0.282 0.384 0.451 0.290 0.418 ## # ... with 1 more variable: .std.resid <dbl>
Support for matrix-columns is on the way in dplyr and in a release cycle or two this won’t be an issue.