forcats 0.5.0

  tidyverse, forcats

  Mara Averick

We’re exceedingly happy to announce the release of forcats 0.5.0 on CRAN. The goal of the forcats package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values.

This release includes improvements to several existing functions, as well as a division of fct_lump() into four new functions: fct_lump_min(), fct_lump_prop(), fct_lump_n(), and fct_lump_lowfreq(). For a complete inventory of updates in this version, please see the Change log.

You can install forcats with:

install.packages("forcats")

Attach the package by running:

library(forcats)

New features

fct_lump() function family

Lumping seems like a popular activity, and there are many interesting variants. Splitting fct_lump() into pieces makes it much easier for this collection to grow over time.

  • fct_lump_min() lumps levels that appear fewer than min times.
  • fct_lump_prop() lumps levels that appear fewer than prop * n times.
  • fct_lump_n() lumps all levels except for the n most frequent (or least frequent, if n < 0).
  • fct_lump_lowfreq() lumps together the least frequent levels, ensuring that "Other" is still the smallest level.
x <- factor(rep(LETTERS[1:8], times = c(40, 10, 5, 27, 3, 1, 1, 1)))

x %>% table()
#> .
#>  A  B  C  D  E  F  G  H 
#> 40 10  5 27  3  1  1  1

x %>% fct_lump_min(5) %>% table()
#> .
#>     A     B     C     D Other 
#>    40    10     5    27     6

x %>% fct_lump_prop(0.10) %>% table()
#> .
#>     A     B     D Other 
#>    40    10    27    11

x %>% fct_lump_n(3) %>% table()
#> .
#>     A     B     D Other 
#>    40    10    27    11

x %>% fct_lump_lowfreq() %>% table()
#> .
#>     A     D Other 
#>    40    27    21

New arguments, and helpers

fct_collapse() now has an argument, other_level, which allows a user-specified Other level. Factors are now correctly collapsed when other_level is not NULL, and makes Other the last level.

fct_reorder2() now has a helper function, first2(), which sorts .y by the first value of .x.

Acknowledgements

A special thanks goes out to everyone who contributed to forcats during Tidyverse developer day: Kelly Bodwin, Layla Bouzoubaa, Scott Brenstuhl, Jonathan Carroll, Monica Gerber, John Goldin, Laura Gomez, Mitchell O’Hara-Wild, Riinu Pius, and Emily Robinson.

We’re extremely grateful for all 48 people who helped with this release: @808sAndBR, @adisarid, @alejandroschuler, @AmeliaMN, @AndrewKinsman, @avishaitsur, @batpigandme, @bczucz, @billdenney, @bxc147, @cuttlefish44, @dan-reznik, @dpprdan, @dylanjm, @GegznaV, @ghost, @gralgomez, @gtm19, @hadley, @hongcui, @jamiefo, @jburos, @jimhester, @johngoldin, @jonocarroll, @jtr13, @jwilliman, @jzadra, @kbodwin, @kei51e, @kyzphong, @labouz, @ledbettc, @lwjohnst86, @martinjhnhadley, @melissakey, @mitchelloharawild, @monicagerber, @mstr3336, @riinuots, @robinsones, @sgschreiber, @sinarueeger, @sindribaldur, @stelsemeyer, @VincentGuyader, @yimingli, and @zkamvar.