# bench 1.0.1

Photo by Rula Sibai

bench is now available on CRAN!

The goal of bench is to benchmark code, by tracking execution time, memory allocations and garbage collections.

install.packages("bench")

### Usage

Benchmarks can be run with bench::mark(), which takes one or more expressions to benchmark against each other.

library(bench)
set.seed(42)
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000))

bench::mark() will throw an error if the results are not equivalent, so you don’t accidentally benchmark non-equivalent code.

bench::mark(
dat[dat$x > 500, ], dat[which(dat$x > 499), ],
subset(dat, x > 500))
#> Error: Each result must equal the first result:
#>   dat[dat$x > 500, ] does not equal dat[which(dat$x > 499), ]

Results are easy to interpret, with human readable units in a rectangular data frame.

bnch <- bench::mark(
dat[dat$x > 500, ], dat[which(dat$x > 500), ],
subset(dat, x > 500))
bnch
#> # A tibble: 3 x 10
#>   expression                     min     mean   median      max itr/sec mem_alloc  n_gc n_itr total_time
#>   <chr>                     <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl> <int>   <bch:tm>
#> 1 dat[dat$x > 500, ] 300µs 347µs 321µs 1.26ms 2884. 416KB 55 949 329ms #> 2 dat[which(dat$x > 500), ]    230µs    281µs    259µs   1.12ms     3563.     357KB    52  1156      324ms
#> 3 subset(dat, x > 500)         374µs    461µs    420µs   1.52ms     2169.     548KB    43   803      370ms

By default, the summary uses absolute measures, however relative results can be obtained by using relative = TRUE in your call to bench::mark() or by calling summary(relative = TRUE) on the results.

summary(bnch, relative = TRUE)
#> # A tibble: 3 x 10
#>   expression                  min  mean median   max itr/sec mem_alloc  n_gc n_itr total_time
#>   <chr>                     <dbl> <dbl>  <dbl> <dbl>     <dbl>     <dbl> <dbl> <dbl>      <dbl>
#> 1 dat[dat$x > 500, ] 1.30 1.24 1.24 1.13 1.33 1.16 1.28 1.18 1.01 #> 2 dat[which(dat$x > 500), ]  1     1      1     1         1.64      1     1.21  1.44       1
#> 3 subset(dat, x > 500)       1.63  1.64   1.62  1.36      1         1.53  1     1          1.14

bench::press() is used to run benchmarks against a grid of parameters. Provide setup and benchmarking code as a single unnamed argument then define sets of values as named arguments. The full combination of values will be expanded and the benchmarks are then pressed together in the result. This allows you to benchmark a set of expressions across a wide variety of input sizes, perform replications and other useful tasks.

set.seed(42)

create_df <- function(rows, cols) {
as.data.frame(setNames(
replicate(cols, runif(rows, 1, 1000), simplify = FALSE),
rep_len(c("x", letters), cols)))
}

results <- bench::press(
rows = c(10000, 100000),
cols = c(10, 100),
{
dat <- create_df(rows, cols)
bench::mark(
min_iterations = 100,
bracket = dat[dat$x > 500, ], which = dat[which(dat$x > 500), ],
subset = subset(dat, x > 500)
)
}
)
results
#> # A tibble: 12 x 12
#>    expression   rows  cols      min     mean   median      max itr/sec mem_alloc  n_gc n_itr total_time
#>    <chr>       <dbl> <dbl> <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl> <int>   <bch:tm>
#>  1 bracket     10000    10    830µs   1.06ms 987.08µs   2.29ms    940.      1.17MB    18   304   323.47ms
#>  2 which       10000    10 447.96µs 652.94µs 564.73µs    1.6ms   1532.    827.04KB    21   551   359.77ms
#>  3 subset      10000    10 906.91µs   1.15ms   1.04ms   2.27ms    866.      1.28MB    21   320   369.44ms
#>  4 bracket    100000    10  14.96ms  17.34ms  17.39ms  19.95ms     57.7    11.54MB    46    54   936.47ms
#>  5 which      100000    10   9.09ms  11.24ms  11.04ms  15.25ms     89.0     7.91MB    32    68   764.24ms
#>  6 subset     100000    10  14.76ms  16.86ms  16.07ms  20.74ms     59.3    12.68MB    46    54   910.46ms
#>  7 bracket     10000   100   7.19ms   9.16ms   8.76ms     13ms    109.      9.71MB    34    66   604.84ms
#>  8 which       10000   100   2.74ms   4.17ms   3.98ms   8.17ms    240.      5.91MB    19    81   338.03ms
#>  9 subset      10000   100   7.19ms   9.63ms   9.46ms  12.54ms    104.      9.84MB    35    65   626.03ms
#> 10 bracket    100000   100 100.19ms  111.1ms 111.08ms 121.63ms      9.00   97.47MB    83    21      2.33s
#> 11 which      100000   100  54.19ms  59.62ms  59.36ms  65.77ms     16.8    59.51MB    36    64      3.82s
#> 12 subset     100000   100 103.36ms 113.58ms 111.83ms    134ms      8.80   98.62MB    84    16      1.82s

### Plotting

ggplot2::autoplot() can be used to generate an informative default plot. This plot is colored by GC level (0, 1, or 2) and faceted by parameters (if any). By default it generates a beeswarm plot, however you can also specify other plot types (jitter, ridge, boxplot, violin). See ?autoplot.bench_mark for full details. This gives you a nice overview of the runs and allows you to gauge the effects of garbage collection on the results.

ggplot2::autoplot(results)

You can also produce fully custom plots by un-nesting the results and working with the data directly. In this case we are exploring how the amount of memory allocated by each expression interacts with the time taken to run.

library(tidyverse)
results %>%
unnest() %>%
filter(gc == "none") %>%
ggplot(aes(x = mem_alloc, y = time, color = expression)) +
geom_point() +
scale_color_brewer(type = "qual", palette = 3) +
geom_smooth(method = "lm", se = F, colour = "grey50")

### Compared to existing methods

Compared to other methods such as system.time, rbenchmark, tictoc or microbenchmark we feel it has a number of benefits.

• Uses the highest precision APIs available for each operating system (often nanosecond-level).
• Tracks memory allocations for each expression.
• Tracks the number and type of R garbage collections per run.
• Verifies equality of expression results by default, to avoid accidentally benchmarking non-equivalent code.
• Uses adaptive stopping by default, running each expression for a set amount of time rather than for a specific number of iterations.
• Runs expressions in batches and calculates summary statistics after filtering out iterations with garbage collections. This allows you to isolate the performance and effects of garbage collection on running time (for more details see Neal 2014).
• Allows benchmarking across a grid of input values with bench::press().

When the development version of bench was introduced a few people expressed concern over the number of dependencies in the package. I will attempt to explain why these dependencies exist and why the true load may actually be less than you might think.

While bench currently has 19 dependencies, only 8 of these are hard dependencies; that is they are needed to install the package. Of these 8 hard dependencies 3 of them (methods, stats, utils) are base packages installed with R. Of these 5 remaining packages 3 have no additional dependencies (glue, profmem, rlang). The two remaining packages (tibble and pillar) are used to provide nice printing of the times and memory sizes and support for list columns to store the timings, garbage collections, and allocations. These are major features of the bench package and it would not work without these dependencies.

The remaining 11 packages are soft dependencies, used either for testing or for optional functionality, most notably plotting. They will not be installed unless explicitly requested.

The microbenchmark package is a good alternative for those looking for a package with only base dependencies.

### Feedback wanted!

We hope bench is a useful tool for benchmarking short expressions of code. Please open GitHub issues for any feature requests or bugs.