tidyr 0.8.0

Photo by Samuel Zeller

We are pleased to announce that tidyr 0.8.0 is now available on CRAN. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data vignette. Install it with:

install.packages("tidyr")

This release mainly contains a bumper crop of small bug fixes and minor improvements, and a considerable increase in test coverage (84% to 99%). For the full details, see the release notes. Here we’ll highlight an important bug fix that might change existing code, and one new feature to try out.

API changes

There was a bug in separate() where negative values had an off-by-one error. Now -1 correctly refers to the first position between characters counting from the right hand side.

df <- tibble(x = c("male1", "female2", "male2"))
df %>% separate(x, c("gender", "number"), -1)
#> # A tibble: 3 x 2
#>   gender number
#>   <chr>  <chr> 
#> 1 male   1     
#> 2 female 2     
#> 3 male   2

New features

Thanks to the suggestion of Andrew Bray, tidyr can now “uncount” a data frame, duplicating aggregate rows:

df <- tibble(x = c("a", "b", "c"), n = c(2, 3, 1))
df %>% uncount(n)
#> # A tibble: 6 x 1
#>   x    
#>   <chr>
#> 1 a    
#> 2 a    
#> 3 b    
#> 4 b    
#> 5 b    
#> 6 c

If you want a unique identifier for each row, use the .id argument:

df %>% uncount(n, .id = "id")
#> # A tibble: 6 x 2
#>   x        id
#>   <chr> <int>
#> 1 a         1
#> 2 a         2
#> 3 b         1
#> 4 b         2
#> 5 b         3
#> 6 c         1
Contents
Upcoming events
London
Nov 18 - Nov 19
This two-day course will provide an overview of using R for supervised learning. The session will step through the process of building, visualizing, testing, and comparing models that are focused on prediction. The goal of the course is to provide a thorough workflow in R that can be used with many different regression or classification techniques. Case studies on real data will be used to illustrate the functionality and several different predictive models are illustrated. The class is taught by Max Kuhn.