waldo 0.3.0

We’re delighted to announce the release of waldo 0.3.0. waldo is designed to find and concisely describe the difference between a pair of R objects. It was designed primarily to improve failure messages for testthat::expect_equal(), but it turns out to be useful in a number of other situations.

You can install it from CRAN with:

install.packages("waldo")

This blog post highlights the two biggest changes in this release: a new display format for data frame differences, and new tools for package developers to control the details of comparison. You can see a full list of changes in the release notes

library(waldo)

Data frame differences

waldo 0.2.0 treated data frames in the same way as lists, which worked fine if a column changed, but wasn’t terribly informative if a row changed. In 0.3.0, data frames get a new row-oriented comparison:

df1 <- data.frame(x = c(1, 2, 3, 4, 5), y = c("a", "b", "c", "d", "e"))
df2 <- data.frame(x = c(1, 2, 3, 10, 4, 5), y = c("a", "b", "c", "X", "d", "e"))
compare(df1, df2)
#> `attr(old, 'row.names')[3:5]`: 3 4 5  
#> `attr(new, 'row.names')[3:6]`: 3 4 5 6
#> 
#> old vs new
#>             x y
#>   old[1, ]  1 a
#>   old[2, ]  2 b
#>   old[3, ]  3 c
#> + new[4, ] 10 X
#>   old[4, ]  4 d
#>   old[5, ]  5 e
#> 
#> `old$x`: 1 2 3    4 5
#> `new$x`: 1 2 3 10 4 5
#> 
#> `old$y`: "a" "b" "c"     "d" "e"
#> `new$y`: "a" "b" "c" "X" "d" "e"

You’ll notice that you still get the column comparison as well. This is important because the row-oriented comparison relies on the printed representation of the data frames, and there are cases where data frames look the same but are actually different. The most important case is probably strings vs factors: example:

(df1 <- data.frame(x = c("a", "b", "c"), stringsAsFactors = TRUE))
#>   x
#> 1 a
#> 2 b
#> 3 c
(df2 <- data.frame(x = c("a", "b", "c"), stringsAsFactors = FALSE))
#>   x
#> 1 a
#> 2 b
#> 3 c
compare(df1, df2)
#> `old$x` is an S3 object of class <factor>, an integer vector
#> `new$x` is a character vector ('a', 'b', 'c')

Control of comparison

When developing new data structures, you often need to be able to control the details of waldo’s comparisons. For example, take the xml2 package, which uses the libxml C library to parse and process XML. When you print XML that’s been parsed with xml2 it looks like a string:

library(xml2)
x1 <- xml2::read_xml("<a>1</a>")
x1
#> {xml_document}
#> <a>

But behind the scenes, it’s actually two pointers to C data structures:

str(x1)
#> List of 2
#>  $ node:<externalptr> 
#>  $ doc :<externalptr> 
#>  - attr(*, "class")= chr [1:2] "xml_document" "xml_node"

This means that a naïve comparison isn’t very useful:

x2 <- xml2::read_xml("<a>2</a>")
compare(unclass(x1), unclass(x2))
#> `old$node` is <pointer: 0x7fbc824876b0>
#> `new$node` is <pointer: 0x7fbc52557dd0>
#> 
#> `old$doc` is <pointer: 0x7fbc82487600>
#> `new$doc` is <pointer: 0x7fbc52544cc0>

To resolve this problem, waldo provides the compare_proxy() generic. This is called on every S3 object prior to comparison so you can transform your objects into equivalent data structures that waldo can more easily compare. For example, waldo includes a built-in compare_proxy.xml_node() method that converts the C data structures back to strings:

compare(x1, x2)
#> lines(as.character(old)) vs lines(as.character(new))
#>   "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
#> - "<a>1</a>"
#> + "<a>2</a>"
#>   ""

(You could imagine converting the XML structure to a tree data structure in R to get even more informative comparisons, but I didn’t take the time to do so.)

compare_proxy() has existed for some time, but waldo 0.3.0 generalised it so, as well as returning the modifying object, it also returns a modified “path” that describes how the object has been transformed:

waldo:::compare_proxy.xml_node
#> function (x, path) 
#> {
#>     list(object = as.character(x), path = paste0("as.character(", 
#>         path, ")"))
#> }
#> <bytecode: 0x7fbc633408d8>
#> <environment: namespace:waldo>

This means that when comparison fails, you get a clear path to the root cause.

Creating a new S3 method is reasonably heavy (and requires a little gymnastics in your package to correctly register without taking a hard dependency on waldo), so thanks to Duncan Murdoch waldo 0.3.0 gains a new way of controlling comparisons: the waldo_opts attribute. This attribute is a list with the same names as the arguments to compare(), where the values are used override the default values of compare(). This is a powerful tool because you can inject these attributes at any level of the object hierarchy, no matter how deep.

For example, take these two lists which contain the same data but in different order:

x1 <- list(a = 1, b = 2)
x2 <- list(b = 2, a = 1)

Usually waldo will report these to be different:

compare(x1, x2)
#> `names(old)`: "a" "b"
#> `names(new)`: "b" "a"

With the new list_as_map arugment (also thanks to an idea from Duncan Murdoch), you can request that the list be compared purely as mappings between names and values:

compare(x1, x2, list_as_map = TRUE)
#> ✔ No differences

This is great if you want this comparison to happen at the top level of the object, but what if the difference is buried deep within a list of lists, and you only want list_as_map to affect one small part of the object? Well, now you can add the waldo_opts attribute:

attr(x1, "waldo_opts") <- list(list_as_map = TRUE)
compare(list(x1), list(x2))
#> ✔ No differences

Acknowledgements

Thanks to all 14 folks who contributed to this release by filing issues, discussion ideas, and creating pull requests: @adamhsparks, @batpigandme, @bhogan-mitre, @Bisaloo, @brodieG, @dmurdoch, @ericnewkirk, @hadley, @krlmlr, @mgirlich, @michaelquinn32, @mpettis, @paleolimbot, and @tmwdr.