tidyr 0.8.0

  tidyverse, tidyr

  Hadley Wickham

We are pleased to announce that tidyr 0.8.0 is now available on CRAN. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data vignette. Install it with:

install.packages("tidyr")

This release mainly contains a bumper crop of small bug fixes and minor improvements, and a considerable increase in test coverage (84% to 99%). For the full details, see the release notes. Here we’ll highlight an important bug fix that might change existing code, and one new feature to try out.

API changes

There was a bug in separate() where negative values had an off-by-one error. Now -1 correctly refers to the first position between characters counting from the right hand side.

df <- tibble(x = c("male1", "female2", "male2"))
df %>% separate(x, c("gender", "number"), -1)
#> # A tibble: 3 x 2
#>   gender number
#>   <chr>  <chr>
#> 1 male   1
#> 2 female 2
#> 3 male   2

New features

Thanks to the suggestion of Andrew Bray, tidyr can now “uncount” a data frame, duplicating aggregate rows:

df <- tibble(x = c("a", "b", "c"), n = c(2, 3, 1))
df %>% uncount(n)
#> # A tibble: 6 x 1
#>   x
#>   <chr>
#> 1 a
#> 2 a
#> 3 b
#> 4 b
#> 5 b
#> 6 c

If you want a unique identifier for each row, use the .id argument:

df %>% uncount(n, .id = "id")
#> # A tibble: 6 x 2
#>   x        id
#>   <chr> <int>
#> 1 a         1
#> 2 a         2
#> 3 b         1
#> 4 b         2
#> 5 b         3
#> 6 c         1