readxl 1.2.0

  tidyverse, readxl

  Jenny Bryan

We’re stoked to announce that readxl 1.2.0 is now available on CRAN. Learn more about readxl at http://readxl.tidyverse.org. Detailed notes are always in the change log.

The readxl package makes it easy to get tabular data out of Excel files and into R with code, not mouse clicks. It supports both the legacy .xls format and the modern XML-based .xlsx format. readxl is expressly designed to be easy to install and use on all operating systems. Therefore it has no external dependencies, such as Java or Perl, which have historically been a source of aggravation with some R packages that read Excel files.

The easiest way to install the latest version from CRAN is to install the whole tidyverse.

install.packages("tidyverse")

Alternatively, install just readxl from CRAN:

install.packages("readxl")

Regardless, you will still need to attach readxl explicitly. It is not a core tidyverse package, i.e. readxl is NOT attached via library(tidyverse). Instead, do this in your script:

library(readxl)

Column name repair

The most exciting change in the v1.2.0 release is the introduction of the new .name_repair argument to read_excel(), read_xlsx(), and read_xls(). readxl exposes the .name_repair argument that is coming soon to version 2.0.0 of the tibble package. Note: the following examples were executed with the not-yet-released version 2.0.0 of the tibble package.

First, rest assured that if your sheet has unique column names, readxl leaves them alone, as always:

read_excel(readxl_example("datasets.xlsx"), sheet = "iris", n_max = 3)
#> # A tibble: 3 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa

However, spreadsheet column names frequently leave much to be desired. .name_repair is a more flexible alternative to passing a specific vector of col_names. You can express what you want in two main ways:

  • Levels of name repair:
    • "minimal": use "" for any missing names
    • "unique": names are made unique readxl’s default
    • "universal": names are made "unique" and syntactic
  • Name repair strategy, as a function that takes (bad) names in and returns (good) names:
    • Function defined in base R, by another package, or by you
    • Anonymous function, specified using a purrr-style ~ formula

Here are two examples of specifying a name repair strategy.

## pass custom function to implement "lower_snake_case"
my_custom_name_repair <- function(nms) tolower(gsub("[.]", "_", nms))
read_excel(
  readxl_example("datasets.xlsx"), n_max = 3,
  .name_repair = my_custom_name_repair
)
#> # A tibble: 3 x 5
#>   sepal_length sepal_width petal_length petal_width species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1          5.1         3.5          1.4         0.2 setosa 
#> 2          4.9         3            1.4         0.2 setosa 
#> 3          4.7         3.2          1.3         0.2 setosa

## use purrr-style formula to truncate names at 3 characters
read_excel(
  readxl_example("datasets.xlsx"), sheet = "chickwts", n_max = 3,
  .name_repair = ~ substr(.x, start = 1, stop = 3)
)
#> # A tibble: 3 x 2
#>     wei fee      
#>   <dbl> <chr>    
#> 1   179 horsebean
#> 2   160 horsebean
#> 3   136 horsebean

Read more in readxl’s new Column Names article.

Hello, is any one there?

readxl now displays a progress spinner in interactive sessions if it looks like the operation might take several seconds or more. This should provide some measure of reassurance when reading large sheets. This was accompanied by a rationalization of when readxl checks for user interrupts.

Updated libxls

The last user-visible change is that all known .xls regressions have been fixed. The previous version, readxl v1.1.0, included some big updates in the embedded libxls library, which were overwhelmingly positive. But there were a few reports of .xls files that went from “readable” to “unreadable”. To the best of my knowledge, those regressions have now all been addressed upstream and in readxl v1.2.0.

Acknowledgements

Special thanks to Evan Miller for his recent work on libxls.

Thank you to the 47 contributors who made this release possible: @2005m, @ajdamico, @alfredojavier5, @antuki, @apreshill, @awwsmm, @batpigandme, @berkorbay, @billdenney, @breichholf, @brianwdavis, @chrowe, @ddheart, @doctsh, @dominicshore, @Gillis, @gorkang, @gregdutkowski, @gregleleu, @hidekoji, @hlenka, @hroptatyr, @j6t, @jamesdalg, @jameshunterbr, @jennybc, @jeroen, @jimhester, @KS2907, @KyleHaynes, @llrs, @ltierney, @LTLA, @mcSamuelDataSci, @mdekstrand, @msgoussi, @N1h1l1sT, @pm321, @ptoche, @randallhelms, @rnuske, @roualdes, @rrohwer, @SebVen, @siemersn, @VincentGuyader, and @yurasmol.