purrr 0.2.3

Photo by Erika Lanpher

We are pleased to announce that purrr 0.2.3 is now on CRAN! Despite the small increment in the version number (our mistake, it should have been 0.3.0), this is a rather major release that includes many new features and bug fixes.

Install the latest version of purrr with:

install.packages("purrr")

You can read about the complete set of changes at https://github.com/tidyverse/purrr/releases/tag/v0.2.3. In this blog post, I will present the highlights of this version which are a family of generic mappers, a new pluck() function for accessing deep elements in a readable way, and a few nifty features and improvements.

Generic mapping

The new modify family of functions introduces genericity to mapping. This means it is now easier to map functions over vector-like S3 objects. The reason is twofold:

  • map() and friends apply functions on the elements of the underlying data structure but that is not always appropriate. An object sometimes contains metadata. A user of such a class probably wants to map over the elements of the data of interest rather than over the metadata fields.

  • map() is type-stable and always returns a list. If you’re mapping over an object, chances are you want a similar object back.

modify() and its variants solve both these problems via S3 dispatch. For instance, let’s try a conditional map over a data frame. Since map_if() is type-stable, it returns a list and we lose the data frame information:

iris %>% map_if(is.factor, as.character) %>% str()
#> List of 5
#>  $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...

On the other hand, modify_if() is generic and returns a data frame:

iris %>% modify_if(is.factor, as.character) %>% str()
#> 'data.frame':    150 obs. of  5 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...

The flip side of genericity is that the vectors returned by the mapped function should obey the constraints of the container type. For instance, data frames require vectors of equal size and it wouldn’t be appropriate to use modify() on a function that returns vectors of variable sizes. In this case you should use map():

map(mtcars, unique) %>% str()
#> List of 11
#>  $ mpg : num [1:25] 21 22.8 21.4 18.7 18.1 14.3 24.4 19.2 17.8 16.4 ...
#>  $ cyl : num [1:3] 6 4 8
#>  $ disp: num [1:27] 160 108 258 360 225 ...
#>  $ hp  : num [1:22] 110 93 175 105 245 62 95 123 180 205 ...
#>  $ drat: num [1:22] 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.07 2.93 ...
#>  $ wt  : num [1:29] 2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec: num [1:30] 16.5 17 18.6 19.4 20.2 ...
#>  $ vs  : num [1:2] 0 1
#>  $ am  : num [1:2] 1 0
#>  $ gear: num [1:3] 4 3 5
#>  $ carb: num [1:6] 4 1 2 3 6 8

As a developer there are two ways to make your class compatible with modify() . The easiest is to implement a method for the subset-assign operator [<- which should be sufficient in most cases. Alternatively you can implement methods for the modify functions themselves as they are all generics.

pluck() for deep subsetting

The plucking mechanism used for indexing into data structures with map() has been extracted into the function pluck(). Plucking is often more readable when extracting an element buried in a deep data structure. Compare this base R code which reads non-linearly:

accessor(x[[1]])$foo

To the equivalent pluck:

x %>% pluck(1, accessor, "foo")

The new function attr_getter() generates attribute accessors that can be used in pluck():

dfs <- list(iris, mtcars)
dfs %>% pluck(2, attr_getter("row.names"))
#>  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
#>  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
#>  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
#> [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
#> [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
#> [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
#> [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
#> [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
#> [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
#> [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
#> [31] "Maserati Bora"       "Volvo 142E"

Or in mapped indexing:

x <- list(
  list(vec = letters, df = mtcars),
  list(vec = LETTERS, df = iris)
)
x %>% map(list("df", attr_getter("row.names")))
#> [[1]]
#>  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
#>  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
#>  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
#> [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
#> [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
#> [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
#> [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
#> [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
#> [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
#> [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
#> [31] "Maserati Bora"       "Volvo 142E"         
#> 
#> [[2]]
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
#>  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
#>  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
#>  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
#>  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
#>  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102
#> [103] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
#> [120] 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
#> [137] 137 138 139 140 141 142 143 144 145 146 147 148 149 150

set_names() is much more flexible

set_names() (now reexported from the rlang package) has become much more flexible. It now takes ... arguments and concatenates them into one vector of names. This saves a bit of typing as you don’t have to concatenate explicitly with c(). The code also looks a bit leaner:

x <- letters[1:3] %>% set_names("foo", "bar", "baz")
x
#> foo bar baz 
#> "a" "b" "c"

In addition set_names() now accepts functions as its first argument. The function is applied to existing names. Let’s transform the names of our new vector to uppercase with base::toupper():

x %>% set_names(toupper)
#> FOO BAR BAZ 
#> "a" "b" "c"

When you supply a function, the ... arguments are forwarded to the function which is often handy. Here we might want to pass further arguments to base::paste():

x %>% set_names(paste, "suffix", sep = "_")
#> foo_suffix bar_suffix baz_suffix 
#>        "a"        "b"        "c"

Reducing with a three-argument function

reduce2() makes it possible to reduce with a three-argument reducing function. reduce2() takes two vectors .x and .y, the first of which is reduced in the usual manner. The accumulated value is passed to the reducing function as first argument while the next value is passed as second argument. .y on the other hand is mapped, not reduced. At each reducing round, the next value of .y is passed to the reducing function as third argument.

In the following example we have a binary paster that takes sep as third argument. With the ordinary reduce() we are stuck with a single separator during the whole reduction:

paste2 <- function(x, y, sep) paste(x, y, sep = sep)
x <- letters[1:4]
reduce(x, paste2, sep = ".")
#> [1] "a.b.c.d"

If we want to vary the separator for each value of the input vector, reduce2() allows us to pass a second vector containing specific separators. This auxiliary vector should have one fewer elements than the reduced vector:

seps <- c("-", ".", "_")
reduce2(x, seps, paste2)
#> [1] "a-b.c_d"

Variadic lambda-formulas

You can now refer to arguments by position in lambda-formulas. The ..1 symbol refers to the first argument, ..2 to the second and so on. This makes it easier to use functions like pmap() with the formula shortcut:

pmap_chr(mtcars, ~paste(..2, ..4, sep = " - "))
#>  [1] "6 - 110" "6 - 110" "4 - 93"  "6 - 110" "8 - 175" "6 - 105" "8 - 245"
#>  [8] "4 - 62"  "4 - 95"  "6 - 123" "6 - 123" "8 - 180" "8 - 180" "8 - 180"
#> [15] "8 - 205" "8 - 215" "8 - 230" "4 - 66"  "4 - 52"  "4 - 65"  "4 - 97" 
#> [22] "8 - 150" "8 - 150" "8 - 245" "8 - 175" "4 - 66"  "4 - 91"  "4 - 113"
#> [29] "8 - 264" "6 - 175" "8 - 335" "4 - 109"

API changes

purrr no longer depends on lazyeval or Rcpp (or dplyr, as of the previous version). This makes the dependency graph of the tidyverse simpler, and makes purrr more suitable as a dependency of lower-level packages.

A number of functions have been renamed or deprecated:

  • is_numeric() and is_scalar_numeric() are deprecated because they don’t test for what you might expect at first sight.

  • as_function() is now as_mapper() because it is a tranformation that makes sense primarily for mapping functions, not in general. The rlang package has an as_function() coercer that is smaller in scope.

  • The data frame suffix _df has been (soft) deprecated in favour of _dfr to more clearly indicate that it’s a row-bind. All variants now also have a _dfc for column binding. They currently don’t handle functions returning vectors because of a bug, but they will in the next minor version of purrr.

  • cross_n() has been renamed to cross(). The _n suffix was removed for consistency with pmap() (originally called map_n() at the start of the project) and transpose() (originally called zip_n()). Similarly, cross_d() has been renamed to cross_df() for consistency with map_df().

  • contains() has been renamed to has_element() to avoid conflicts with dplyr.

  • at_depth() has been renamed to modify_depth().

Finally, these functions have been removed from the package:

  • The previously deprecated functions flatmap(), map3(), map_n(), walk3(), walk_n(), zip2(), zip3(), zip_n() have been removed.

  • order_by(), sort_by() and split_by() have been removed. order_by() conflicted with dplyr::order_by() and the complete family doesn’t feel that useful. Use tibbles instead.

Contents
Upcoming events
Washington, DC
Aug 23-24
This two-day course will provide an overview of using R for supervised learning. The session will step through the process of building, visualizing, testing, and comparing models that are focused on prediction.