We’re delighted to announce the release of haven 2.4.0. haven allows you to read and write SAS, SPSS, and Stata data formats from R, thanks to the wonderful ReadStat C library written by Evan Miller.

This blog post will show off the most important changes to the package; you can see a full list of changes in the release notes.

labelled_spss() and labelled()

labelled_spss() gains full vctrs support thanks to the hard work of Danny Smith. This means that labelled_spss() objects should now work seamlessly with dplyr 1.0.0, tidyr 1.0.0.

I’ve also made labelled() vectors are more permissive when concatenating. Now, output labels will be a combination of the left-hand and the right-hand side, and if there are duplicate labels, the left-hand side (first assigned) will win:

x1 <- labelled(1, labels = c(USA = 1))
x2 <- labelled(64, labels = c(NZ = 64))
c(x1, x2)
#> <labelled<double>[2]>
#> [1]  1 64
#> Labels:
#>  value label
#>      1   USA
#>     64    NZ

# It's now your responsibility to only combine things that make sense
x3 <- labelled(c(1, 2, 5, 3, 2), labels = c(Good = 5, Bad = 1))
c(x1, x3)
#> <labelled<double>[6]>
#> [1] 1 1 2 5 3 2
#> Labels:
#>  value label
#>      1   USA
#>      5  Good

Other improvements

  • Date-times are no longer converted to UTC. This should ensure that you see the same date-time in R and in Stata/SPSS/SAS. (But the underlying time point might be different because Stata/SPSS/SAS don’t appear to support time zones.)

  • Bundleed ReadStat has been updated to version 1.1.5 from 1.1.3 so includes ReadStat improvements in v1.1.5 and v1.1.4. Probably the biggest improvement is support for SAS-binary (aka Ross) compression.

  • write_*() now validates file and variable metadata with ReadStat, and validation failures now provide more details about the source of the problem (e.g.┬áthe column name), making it easier to track down issues.


