tidyr 1.1.0

Photo by Jan Vasek

Hadley Wickham

We’re delighted to announce that tidyr 1.1.0 is now available from CRAN. tidyr provides a set of tools for transforming data frames to and from tidy data, where each variable is a column and each observation is a row. Tidy data is a convention for matching the semantics and structure of your data that makes using the rest of the tidyverse (and many other R packages) much easier.

You can install install tidyr with:

install.packages("tidyr")

This release doesn’t include any major new excitement but it includes a whole passel of minor improvements building on the major changes in tidyr 1.0.0, and generally making everything easier to use and a bit more flexible. In this blog post, I’ll give a quick run down on new pivoting features; see the full release announcement for the details of other changes.

library(tidyr)

pivot_longer()

  • pivot_longer() gains a new names_transform argument that allows you to transform column names before they turn into data. For example, you can use this new argument along with readr::parse_number() to parse column names that really should be numbers:

    df <- tibble(id = 1, wk1 = 0, wk2 = 4, wk3 = 9, wk4 = 25)
    df %>% pivot_longer(
      cols = starts_with("wk"),
      names_to = "week",
      names_transform = list(week = readr::parse_number),
    )
    #> # A tibble: 4 x 3
    #>      id  week value
    #>   <dbl> <dbl> <dbl>
    #> 1     1     1     0
    #> 2     1     2     4
    #> 3     1     3     9
    #> 4     1     4    25
    
  • pivot_longer() can now discard uninformative column names by setting names_to = character(), thanks to idea and implementation from Mitch O’Hara Wild:

    df <- tibble(id = 1:2, fruitful_panda = 3:4, angry_aardvark = 5:6)  
    df %>% pivot_longer(-id, names_to = character())
    #> # A tibble: 4 x 2
    #>      id value
    #>   <int> <int>
    #> 1     1     3
    #> 2     1     5
    #> 3     2     4
    #> 4     2     6
    
  • pivot_longer() no longer creates a .copy variable in the presence of duplicate column names. This makes it more consistent with the handling of non-unique pivot specifications.

    df <- tibble(id = 1:3, x = 1:3, x = 4:6, .name_repair = "minimal")  
    df %>% pivot_longer(-id)
    #> # A tibble: 6 x 3
    #>      id name  value
    #>   <int> <chr> <int>
    #> 1     1 x         1
    #> 2     1 x         4
    #> 3     2 x         2
    #> 4     2 x         5
    #> 5     3 x         3
    #> 6     3 x         6
    
  • pivot_longer() automatically disambiguates non-unique outputs, which can occur when the input variables include some additional component that you don’t care about and want to discard. You can discard parts of column names either with names_pattern or with NA in names_to.

    df <- tibble(id = 1:3, x_1 = 1:3, y_2 = 4:6, y_3 = 9:11)
    df %>% pivot_longer(-id, names_pattern = "(.)_.")
    #> # A tibble: 9 x 3
    #>      id name  value
    #>   <int> <chr> <int>
    #> 1     1 x         1
    #> 2     1 y         4
    #> 3     1 y         9
    #> 4     2 x         2
    #> 5     2 y         5
    #> 6     2 y        10
    #> 7     3 x         3
    #> 8     3 y         6
    #> 9     3 y        11
        
    df %>% pivot_longer(-id, names_sep = "_", names_to = c("name", NA))
    #> # A tibble: 9 x 3
    #>      id name  value
    #>   <int> <chr> <int>
    #> 1     1 x         1
    #> 2     1 y         4
    #> 3     1 y         9
    #> 4     2 x         2
    #> 5     2 y         5
    #> 6     2 y        10
    #> 7     3 x         3
    #> 8     3 y         6
    #> 9     3 y        11
        
    df %>% pivot_longer(-id, names_sep = "_", names_to = c(".value", NA))
    #> # A tibble: 6 x 3
    #>      id     x     y
    #>   <int> <int> <int>
    #> 1     1     1     4
    #> 2     1    NA     9
    #> 3     2     2     5
    #> 4     2    NA    10
    #> 5     3     3     6
    #> 6     3    NA    11
    

pivot_wider()

  • pivot_wider() gains a names_sort argument which allows you to sort column names in order. The default, FALSE, orders columns by their first appearance. I’m considering changing the default value to TRUE in a future version.

    df <- tibble(
      day_int = c(4, 3, 5, 1, 2),
      day_fac = factor(day_int, labels = c("Mon", "Tue", "Wed", "Thu", "Fri"))
    )
    df %>% pivot_wider(
      names_from = day_fac, 
      values_from = day_int
    )
    #> # A tibble: 1 x 5
    #>     Thu   Wed   Fri   Mon   Tue
    #>   <dbl> <dbl> <dbl> <dbl> <dbl>
    #> 1     4     3     5     1     2
    df %>% pivot_wider(
      names_from = day_fac,
      names_sort = TRUE,
      values_from = day_int
    )
    #> # A tibble: 1 x 5
    #>     Mon   Tue   Wed   Thu   Fri
    #>   <dbl> <dbl> <dbl> <dbl> <dbl>
    #> 1     1     2     3     4     5
    
  • pivot_wider() gains a names_glue argument that allows you to construct output column names with a glue specification when names_to includes multiple columns.

    df <- tibble(
      first = "a",
      second = "1",
      third = "X",
      val = 1
    )
    df %>% pivot_wider(
      names_from = c(first, second, third), 
      values_from = val,
      names_glue = "{first}.{second}_{third}"
    )
    #> # A tibble: 1 x 1
    #>   a.1_X
    #>   <dbl>
    #> 1     1
    
  • pivot_wider() arguments values_fn and values_fill can now be single values; you now only need to use a named list if you want to use different values for different value columns. You’ll also get better errors if they’re not of the correct type.

  • Finally, both pivot_wider() and pivot_longer() are considerably more performant, thanks largely to improvements in the underlying vctrs code by Davis Vaughn.

Acknowledgements

Thanks to all 135 people who contributed to this version of tidyr by discussing issues and suggesting new code! @abichat, @abiyug, @adisarid, @ahmohamed, @akikirinrin, @albertotb, @alex-pax, @amirmazmi, @andtheWings, @ashiklom, @atusy, @batpigandme, @bertrandh, @BillBlanc, @billdenney, @BrianDiggs, @bushdanielkwajaffa, @cderv, @CGMossa, @cgoo4, @charliejhadley, @chester-gan, @cimentadaj, @cjvanlissa, @cloversleaves, @colearendt, @dah33, @DanOvando, @dapperjapper, @daranzolin, @davidhunterwalsh, @davisadamw, @DavisVaughan, @dchiu911, @dpastoor, @dpeterson71, @dpprdan, @eantworth, @earcanal, @echasnovski, @enixam, @ericgunnink, @florianm, @fmmattioni, @franzbischoff, @GegznaV, @geotheory, @ggrothendieck, @gregorp, @hadley, @HanOostdijk, @henry090, @iago-pssjd, @ifellows, @infotroph, @jam1015, @jannikbuhr, @jasonpcasey, @jeffreypullin, @jennybc, @jenren, @JenspederM, @jeonghyunwoo, @jjnote, @jmh530, @JohnCoene, @joshua-theisen, @JosiahParry, @jthomasmock, @jwilliman, @kaneplusplus, @kaybenleroll, @kent37, @kiernann, @krlmlr, @lionel-, @Ljupch0, @lymanmark, @maelle, @majazaloznik, @mattantaliss, @mattwarkentin, @maurolepore, @md0u80c9, @mgirlich, @MikeEdinger, @mikemahoney218, @mikmart, @mitchelloharawild, @moodymudskipper, @msberends, @msgoussi, @mstackhouse, @MyKo101, @nacnudus, @namelessjon, @ndrewGele, @Nicktz, @npjc, @osorensen, @PathosEthosLogos, @philipp-baumann, @PMSeitzer, @psychelzh, @randomgambit, @riinuots, @romagnolid, @romainfrancois, @rvino, @salim-b, @shanepiesik, @shannonpileggi, @sharleenw, @siddharthprabhu, @simazhi, @skr5k, @skydavis435, @smingerson, @smithjd, @srnnkls, @stragu, @stufield, @tangcxx, @tdhock, @the-Zian, @tomhopper, @topepo, @wgrundlingh, @wibeasley, @william3031, @wmoldham, @wolski, @xkdog, @xtimbeau, and @yusuzech.