dials 0.0.3

Photo by Adi Goldstein

A new version of dials is on CRAN. The package has contains basic frameworks for managing tuning parameters for models. It is a significant update to the package. The major change is that parameter objects are now generated by functions (as opposed to the prototype objects in the previous version). For example, to make a dials object for the number of PCA components in a model:

# previously
pca_comps <- num_comp

# now
pca_comps <- num_comp()

For numeric parameters, the range of values can be set using the first argument:

library(tidymodels)
## ── Attaching packages ──────────────────────────────────────── tidymodels 0.0.2 ──
## ✔ broom     0.5.2       ✔ purrr     0.3.2  
## ✔ dials     0.0.3       ✔ recipes   0.1.7  
## ✔ dplyr     0.8.3       ✔ rsample   0.0.5  
## ✔ ggplot2   3.2.1       ✔ tibble    2.1.3  
## ✔ infer     0.4.0.1     ✔ yardstick 0.0.4  
## ✔ parsnip   0.0.3.1
## ── Conflicts ─────────────────────────────────────────── tidymodels_conflicts() ──
## ✖ purrr::discard()  masks scales::discard()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ dplyr::lag()      masks stats::lag()
## ✖ ggplot2::margin() masks dials::margin()
## ✖ dials::offset()   masks stats::offset()
## ✖ recipes::step()   masks stats::step()
num_comp()
## # Components  (quantitative)
## Range: [1, ?]
num_comp(range = c(2, 10))
## # Components  (quantitative)
## Range: [2, 10]

Sets of tuning parameters can be created and managed:

boosting_set <- param_set(list(trees(), splits = tree_depth(), min_n()))
boosting_set
## Collection of 3 parameters for tuning
## 
##      id parameter type object class
##   trees          trees    nparam[+]
##  splits     tree_depth    nparam[+]
##   min_n          min_n    nparam[+]
# modifying the parameter range:
boosting_set %>% update(trees = trees(c(100, 1000)))
## Collection of 3 parameters for tuning
## 
##      id parameter type object class
##   trees          trees    nparam[+]
##  splits     tree_depth    nparam[+]
##   min_n          min_n    nparam[+]

Note that the tree depth parameter has a user-defined identification variable. This can come in handy when there are multiple tuning parameters of the same type. For example, suppose two variables (x1 and x2) were modeled using splines. The flexibility of each grouped be represented in a parameter set:

splines <- param_set(list(x1_df = deg_free(), x2_df = deg_free()))
splines
## Collection of 2 parameters for tuning
## 
##     id parameter type object class
##  x1_df       deg_free    nparam[+]
##  x2_df       deg_free    nparam[+]

This version of dials also contains two functions for creating space-filling designs, a technique from statistical experimental design theory. The two functions are grid_max_entropy() and grid_latin_hypercube().

svm_set <- param_set(list(rbf_sigma(), cost()))
set.seed(463)
me_grid <- grid_max_entropy(svm_set, size = 20) %>% mutate(type = "max entropy")
ls_grid <- grid_latin_hypercube(svm_set, size = 20) %>% mutate(type = "latin hypercube")
rn_grid <- grid_random(svm_set, size = 20) %>% mutate(type = "random")

bind_rows(me_grid, ls_grid, rn_grid) %>% 
  ggplot(aes(x = cost, y = rbf_sigma)) + 
  geom_point() + 
  facet_wrap( ~ type) +
  scale_x_log10() + 
  scale_y_log10()  + 
  coord_fixed(ratio = 1/4)

dials will be central to the upcoming framework for optimizing tuning parameters so there is much more to come regarding this package.

Contents
Upcoming events
London
Nov 18 - Nov 19
This two-day course will provide an overview of using R for supervised learning. The session will step through the process of building, visualizing, testing, and comparing models that are focused on prediction. The goal of the course is to provide a thorough workflow in R that can be used with many different regression or classification techniques. Case studies on real data will be used to illustrate the functionality and several different predictive models are illustrated. The class is taught by Max Kuhn.