forcats 0.4.0

Photo by Peng Louis

We are pleased to announce that forcats 0.4.0 is now on CRAN. The forcats package provides a suite of useful tools that solve common problems with factors in R. This version benefited from the hard work of contributors new and old at our first tidyverse dev day. For a complete set of changes, please see the release notes.

To install the latest version, run:

install.packages("forcats")

As always, attach the package with:

library(forcats)

New functions

fct_cross() creates a new factor containing the combined levels from two or more input factors, similar to base::interaction().

fruit <- factor(c("apple", "kiwi", "apple", "apple"))
colour <- factor(c("green", "green", "red", "green"))
fct_cross(fruit, colour)
#> [1] apple:green kiwi:green  apple:red   apple:green
#> Levels: apple:green apple:red kiwi:green

fct_lump_min() preserves levels that appear at least min times (can also be used with the w weighted argument).

x <- factor(letters[rpois(50, 3)])
fct_lump_min(x, min = 10)
#>  [1] Other b     Other b     Other Other Other b     Other Other b    
#> [12] Other Other Other b     Other b     Other Other b     b     Other
#> [23] Other Other b     b     Other Other Other Other Other b     Other
#> [34] Other Other b     Other Other Other Other Other Other Other Other
#> [45] Other b     Other b    
#> Levels: b Other

fct_match() tests for the presence of levels in a factor, providing a safer alternative to %in% by throwing an error when there are unexpected levels.

table(fct_match(gss_cat$marital, c("Married", "Divorced")))
#> 
#> FALSE  TRUE 
#>  7983 13500
table(gss_cat$marital %in% c("Maried", "Davorced"))
#> 
#> FALSE 
#> 21483
table(fct_match(gss_cat$marital, c("Maried", "Davorced")))
#> Error: Levels not present in factor: "Maried", "Davorced"

Other improvements

  • fct_relevel() can now relevel factors using a function that is passed the current levels.

    f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))
    fct_relevel(f, sort)
    #> [1] a b c d
    #> Levels: a b c d
    fct_relevel(f, rev)
    #> [1] a b c d
    #> Levels: a d c b
    
  • as_factor() now has a numeric method which orders factors in numeric order, unlike the other methods which default to order of appearance.

    y <- c("1.1", "11", "2.2", "22")
    as_factor(y)
    #> [1] 1.1 11  2.2 22 
    #> Levels: 1.1 11 2.2 22
    z <- as.numeric(y)
    as_factor(z)
    #> [1] 1.1 11  2.2 22 
    #> Levels: 1.1 2.2 11 22
    
  • fct_inseq() reorders labels numerically, when possible.

Thanks to Emily Robinson, forcats also has a new introductory vignette.

Acknowledgements

We’re grateful for the 35 people who contributed to this release: @ahaque-utd, @AmeliaMN, @ashiklom, @batpigandme, @billdenney, @brianwdavis, @corybrunson, @dalewsteele, @ewenharrison, @grayskripko, @gtm19, @hack-r, @hadley, @huftis, @isteves, @jimhester, @jonocarroll, @jrosen48, @jthomasmock, @kbodwin, @mdjeric, @orchid00, @richierocks, @robinsones, @rosedu1, @RoyalTS, @russHyde, @Ryo-N7, @s-fleck, @seaaan, @spedygiorgio, @tslumley, @xuhuizhang, @zhiiiyang, and @zx8754.

Upcoming events
Bellevue WA
May 29 - June 1
Mara Averick, Garrett Grolemund, Javier Luraschi, Max Kuhn, and Kevin Kuo will be teaching workshops on text mining, the tidyverse, deep learning, and tidy modeling.
Toulouse, France
July 9
Jim Hester, Hadley Wickham, and Jenny Bryan are teaching a half-day tutorial on Package Development.