broom 0.7.0

  tidymodels

  Simon Couch and Alex Hayes

We’re excited to announce the release of broom 0.7.0 on CRAN!

broom is a package for summarizing statistical model objects in tidy tibbles. While several compatibility updates have been released in recent months, this is the first major update to broom in almost two years. This update includes many new tidier methods, bug fixes, improvements to existing tidier methods and their documentation, and improvements to maintainability and internal consistency. The full list of changes is available in the package release notes.

This release was made possible in part by the RStudio internship program, which has allowed one of us ( Simon Couch) to work on broom full-time for the last month.

You can install the most recent broom update with the following code:

install.packages("broom")

Then attach it for use with:

library(broom)

We’ll outline some of the more notable changes below!

New Tidier Methods

For one, this release includes support for several new model objects—many of these additions came from first-time contributors to broom!

  • anova objects from the car package
  • pam objects from the cluster package
  • drm objects from the drc package
  • summary_emm objects from the emmeans package
  • epi.2by2 objects from the epiR package
  • fixest objects from the fixest package
  • regsubsets objects from the leaps package
  • lm.beta objects from the lm.beta package
  • rma objects from the metafor package
  • mfx, logitmfx, negbinmfx, poissonmfx, probitmfx, and betamfx objects from themfx package
  • lmrob and glmrob objects from the robustbase package
  • sarlm objects from the spatialreg package
  • speedglm objects from the speedglm package
  • svyglm objects from the survey package
  • We have restored a simplified version of glance.aov()

Improvements and Bug Fixes for Existing Tidiers

This update also features many bug fixes improvements to existing tidiers. Some of the more notable ones:

  • Many improvements to the consistency of augment.*() methods:
    • If you pass a dataset to augment() via the data or newdata arguments, you are now guaranteed that the augmented dataset will have exactly the same number of rows as the original dataset. This differs from previous behavior primarily when there are missing values. Previously augment() would drop rows containing NA. This should no longer be the case. As a result, augment.*() methods no longer accept an na.action argument.
    • In previous versions, several augment.*() methods inherited the augment.lm() method, but required additions to the augment.lm() method itself. We have shifted away from this approach in favor of re-implementing many augment.*() methods as standalone methods making use of internal helper functions. As a result, augment.lm() and some related methods have deprecated (previously unused) arguments.
    • The .resid column in the output of augment().* methods is now consistently defined as y - y_hat.
    • augment() tries to give an informative error when data isn’t the original training data.
  • Several glance.*() methods have been refactored in order to return a one-row tibble even when the model matrix is rank-deficient.
  • Many glance() methods now return a nobs column, which contains the number of data points used to fit the model!
  • Various warnings resulting from changes to the tidyr API in v1.0.0 have been fixed.
  • Added options to provide additional columns in the outputs of glance.biglm(), tidy.felm(), tidy.lmsobj(), tidy.lmodel2(), tidy.polr(), tidy.prcomp(), tidy.zoo(), tidy_optim()

Breaking Changes and Deprecations

This release also contains a number of breaking changes and deprecations meant to improve maintainability and internal consistency.

  • We have changed how we report degrees of freedom for lm objects. This is especially important for instructors in statistics courses. Previously the df column in glance.lm() reported the rank of the design matrix. Now it reports degrees of freedom of the numerator for the overall F-statistic. This is equal to the rank of the model matrix minus one (unless you omit an intercept column), so the new df should be the old df minus one.
  • We are moving away from supporting summary.*() objects. In particular, we have removed tidy.summary.lm() as part of a major overhaul of internals. Instead of calling tidy() on summary-like objects, please call tidy() directly on model objects moving forward.
  • We have removed all support for the quick argument in tidy() methods. This is to simplify internals and is for maintainability purposes. We anticipate this will not influence many users as few people seemed to use it. If this majorly cramps your style, let us know, as we are considering a new verb to return only model parameters. In the meantime, stats::coef() together with tibble::enframe() provides most of the functionality of tidy(..., quick = TRUE).
  • All conf.int arguments now default to FALSE, and all conf.level arguments now default to 0.95. This should primarily affect tidy.survreg(), which previously always returned confidence intervals, although there are some others.
  • Tidiers for emmeans-objects use the arguments conf.int and conf.level instead of relying on the argument names native to the emmeans::summary()-methods (i.e., infer and level). Similarly, multcomp-tidiers now include a call to summary() as previous behavior was akin to setting the now removed argument quick = TRUE. Both families of tidiers now use the adj.p.value column name when appropriate. Finally, emmeans-, multcomp-, and TukeyHSD-tidiers now consistently use the column names contrast and null.value instead of comparison, level1 and level2, or lhs and rhs.

This release of broom also deprecates several helper functions as well as tidier methods for a number of non-model objects, each in favor of more principled approaches from other packages (outlined in the NEWS file). Notably, though, tidiers have been deprecated for data frames, rowwise data frames, vectors, and matrices. Further, we have moved forward with the planned transfer of tidiers for mixed models to broom.mixed.

Other Changes

Most all unit testing for the package is now supported by the modeltests package!

Also, we have revised several vignettes and moved them to the tidymodels website. For backward compatibility, the existing vignettes will now simply link to the revised versions.

Finally, the package’s website has moved from its previous tidyverse domain to broom.tidymodels.org.

Looking Forward

Most notably, the broom dev team is changing the process to add new tidying methods to the package. Instead, we ask that issues/PRs requesting support for new model objects be directed to the model-owning package (i.e. the package that the model is exported from) rather than to broom. If the maintainers of those packages are unable or unwilling to provide tidying methods in the model-owning package, it might be possible to add the new tidier to broom. broom is near its limit of tidiers; adding more may make the package unsustainable.

For developers exporting tidying methods directly from model-owning packages, we are actively working to provide resources to both ease the process of writing new tidiers methods and reduce the dependency burden of taking on broom generics and helpers. As for the first point, we recently posted an article on the tidymodels website providing notes on best practices for writing tidiers. This article will be kept up to date as we develop new resources for easing the process of writing new tidier methods. As for the latter, the r-lib/generics package provides lightweight dependencies for the main broom generics. We hope to soon provide a coherent suite of helper functions for use in external broom methods.

We anticipate that the most active development on the broom package, looking forward, will center on improving augment() methods. We are also hoping to change our CRAN release cycle and to provide incremental updates every several months rather than major changes every couple years.

Contributors

This release features work and input from over 140 contributors (over 50 of them for their first time) since the last major release. See the package release notes to see more specific notes on contributions. Thank you all for your thoughtful comments, patience, and hard work!

@abbylsmith, @acoppock, @ajb5d, @aloy, @AndrewKostandy, @angusmoore, @anniew, @aperaltasantos, @asbates, @asondhi, @asreece, @atyre2, @bachmeil, @batpigandme, @bbolker, @benjbuch, @bfgray3, @BibeFiu, @billdenney, @BrianOB, @briatte, @bruc, @brunaw, @brunolucian, @bschneidr, @carlislerainey, @CGMossa, @CharlesNaylor, @ChuliangXiao, @cimentadaj, @crsh, @cwang23, @DavisVaughan, @dchiu911, @ddsjoberg, @dgrtwo, @dmenne, @dylanjm, @ecohen13, @economer, @EDiLD, @ekatko1, @ellessenne, @ethchr, @florencevdubois, @GegznaV, @gershomtripp, @grantmcdermott, @gregmacfarlane, @hadley, @haozhu233, @hasenbratan, @HenrikBengtsson, @hermandr, @hideaki, @hughjonesd, @iago-pssjd, @ifellows, @IndrajeetPatil, @Inferrator, @istvan60, @jamesmartherus, @JanLauGe, @jasonyang5, @jaspercooper, @jcfisher, @jennybc, @jessecambon, @jkylearmstrongibx, @jmuhlenkamp, @JulianMutz, @Jungpin, @jwilber, @jyuu, @karissawhiting, @karldw, @khailper, @krauskae, @kuriwaki, @kyusque, @KZARCA, @Laura-O, @ldlpdx, @ldmahoney, @lilymedina, @llendway, @lrose1, @ltobalina, @LukasWallrich, @lukesonnet, @lwjohnst86, @malcolmbarrett, @margarethannum, @mariusbarth, @MatthieuStigler, @mattle24, @mattpollock, @mattwarkentin, @mine-cetinkaya-rundel, @mkirzon, @mlaviolet, @Move87, @namarkus, @nlubock, @nmjakobsen, @ns-1m, @nt-williams, @oij11, @petrhrobar, @PirateGrunt, @pjpaulpj, @pkq, @poppymiller, @QuLogic, @randomgambit, @riinuots, @RobertoMuriel, @Roisin-White, @romainfrancois, @rsbivand, @serina-robinson, @shabbybanks, @Silver-Fang, @Sim19, @simonpcouch, @sjackson1236, @softloud, @stefvanbuuren, @strengejacke, @sushmitavgopalan16, @tcuongd, @thisisnic, @topepo, @tyluRp, @vincentarelbundock, @vjcitn, @vnijs, @weiyangtham, @william3031, @x249wang, @xieguagua, @yrosseel, and @zoews