googlesheets4 1.0.0

  gargle, googlesheets4

  Jenny Bryan

We’re over the moon to announce the release of googlesheets4, version 1.0.0 (https://googlesheets4.tidyverse.org).

googlesheets4 is a package to work with Google Sheets from R. It wraps v4 of the Sheets API. googlesheets4 is focused on spreadsheet-y tasks that require a notion of worksheets, cells, and ranges, while the companion package googledrive handles more general file operations, such as renaming, sharing, or moving.

Install googlesheets4 from CRAN like so:

install.packages("googlesheets4")

Then attach it for use via:

The release of version 1.0.0 means googlesheets4 has graduated from being experimental to being stable, in terms of the tidyverse lifecycle stages. 1

Since we last blogged about it, googlesheets4 has seen some “quality of life” improvements, but no earthshaking changes. The overall interface is more polished, we automatically retry requests that yield the dreaded 429 RESOURCE_EXHAUSTED, and there’s better handling of some empty cell edge cases. We’re also bumping the required version of the gargle package (https://gargle.r-lib.org), which handles everything around auth.

You can see a full list of changes in the release notes.

Auth updates

If you are generally fairly passive about googlesheets4 auth, then you should just sit back and let things happen organically during usage. If you’ve used googlesheets4 before, you can expect to see some messages about cleaning and relocating the token cache when you first use v1.0.0. You can also expect to re-authenticate yourself with Google and re-authorize the “Tidyverse API Packages” to work with your files. This is all due to changes in gargle.

If your usage requires you to be more proactive about auth, read the blog post for gargle’s recent v1.2.0 release. A key point is that we have rolled the built-in OAuth client, which is why those relying on it will need to re-auth.

If the rolling of the tidyverse OAuth client is highly disruptive to your workflow, consider this a wake-up call that you should be using your own OAuth client or, quite possibly, an entirely different method of auth. Our credential rolling will have no impact on users who use their own OAuth client or service account tokens.

If you often use googlesheets4 together with googledrive, remember that the article Using googlesheets4 with googledrive gives some tips for how to coordinate auth.

Retries

The Sheets API has usage limits that are low enough even regular folks bump up against them occasionally:

  • 100 requests per 100 seconds per user
  • 500 requests per 100 seconds per project

When you hit one of these limits, your request fails with the error code 429 RESOURCE_EXHAUSTED. Thanks to a change in gargle, we now automatically retry such a request and, if it looks like you’ve exhausted your user quota, that first wait is > 100 seconds (!!).

If this happens to you fairly often, you should contemplate whether your code relies on some self-defeating pattern, like hitting the Sheets API repeatedly in a tight loop. Here’s an example of bad vs. good googlesheets4 code:

gapminder <- gs4_example("gapminder")
sp <- sheet_properties(gapminder)
(n <- sp$grid_rows[sp$name == "Africa"])
#> [1] 625

# this is BAD IDEA = reading individual cells in a loop
for (i in seq_len(n)) {
  gapminder %>%
    range_read(sheet = "Africa", range = sprintf("C%i", i))
}
#> ✓ Reading from "gapminder".
#> ✓ Range ''Africa'!C1'.
#> ...
#> x Request failed [429]. Retry 1 happens in 100.8 seconds ...
#> ✓ Range ''Africa'!C28'.
#> ...
#> x Request failed [429]. Retry 1 happens in 100.4 seconds ...
#> x Request failed [429]. Retry 2 happens in 8.8 seconds ...
#> ...

# this is a GOOD IDEA = reading all cells at once
gapminder %>%
  range_read(sheet = "Africa", range = "C:C") %>%
  head()
#> ✓ Reading from "gapminder".
#> ✓ Range ''Africa'!C:C'.
#> # A tibble: 6 x 1
#>    year
#>   <dbl>
#> 1  1952
#> 2  1957
#> 3  1962
#> 4  1967
#> 5  1972
#> 6  1977

Lesson: work in bulk as much as possible, as opposed to making lots of little piecemeal requests.

What about the per project limit? If you are auth’ed as a regular user, using the built-in OAuth client, this project refers to the Tidyverse API Packages. Yes, that means you’re sharing project-level quota with all your fellow useRs who also auth’ed this way!

If you hit per project limits regularly and it upsets your workflow, the universe is telling you that it’s time to configure your own OAuth client or use an entirely different method of auth, such as a service account token. Read more in the gargle article How to get your own API credentials.

Casual googlesheets4 users can go with the flow and accept that the occasional request will need an automatic retry. If you want to learn more, the functionality comes from gargle::request_retry(). In the development version of gargle, we retry for an even larger set of error codes (408, 429, 500, 502, and 503). If that appeals to you and you’re willing to install a pre-release package, we’d love to hear how this works for you.

User interface

The googlesheets4 user interface (UI) has gotten more stylish, thanks to the cli package (https://cli.r-lib.org). This applies to informational messages, as well as to errors.

The semantic UI encouraged by cli is a great fit for googlesheets4, where it’s useful to apply distinct inline styles for (spread)Sheet name, (work)sheet name, and cell range. Also bullets! Here is a fictional message that shows off some of these lovely things:

#> Hey, we're doing exciting things in the chicken_sheet!
#>  Did you know it has a worksheet called chicken_scratch?
#>  Uh-oh, in range A1:D4, some data appears to be encoded in the cell color :(

In a more technical vein, all errors thrown by googlesheets4 now route through rlang::abort(). This has various benefits, including better access to the backtrace and error data. All errors thrown by googlesheets4 now bear the class "googlesheets4_error".

googlesheets4_quiet is a new option to suppress informational messages from googlesheets4. local_gs4_quiet() and with_gs4_quiet() are withr-style convenience helpers for setting googlesheets4_quiet = TRUE.

Empty cells

It’s (not so) funny how much effort goes into handling the absence of data and googlesheets4 is no exception. We’ve made a few improvements to how we read or write nothing.

The na argument of read_sheet() has become more capable and more consistent with readr. Specifically, na = character() (or the general lack of "" among the na strings) results in cells with no data appearing as the empty string "" within a character vector, as opposed to NA.

dat <- data.frame(
  x = c("one", "", "three")
)
ss <- gs4_create("blog-post-blanks", sheets = dat)
#> ✓ Creating new Sheet: "blog-post-blanks".

read_sheet(ss)
#> ✓ Reading from "blog-post-blanks".
#> ✓ Range 'dat'.
#> # A tibble: 3 x 1
#>   x    
#>   <chr>
#> 1 one  
#> 2 <NA> 
#> 3 three

read_sheet(ss, na = character())
#> ✓ Reading from "blog-post-blanks".
#> ✓ Range 'dat'.
#> # A tibble: 3 x 1
#>   x      
#>   <chr>  
#> 1 "one"  
#> 2 ""     
#> 3 "three"

Explicit NULLs are also now written properly, i.e. as an empty cell. This can come up with list-columns. List-columns can easily creep in when working with Sheets edited by humans, who tend to create mixed columns by accident, i.e. mixing numbers and text.

Acknowledgements

We’d like to thank everyone who has helped to shape googlesheets4, since the release of v0.2.0, through their contributions in issues and pull requests:

@aaronmams, @ajjitn, @akgold, @AvinaHunjan, @batpigandme, @bwganblack, @cmichaud92, @comicalequation, @CorradoLanera, @cpilat97, @daattali, @davidski, @devu123, @dgmdevelopment, @douglascm, @dulearnaux, @ericcrandall, @ericpgreen, @featherduino, @felixetorres, @ferguskeatinge, @ghost, @Helena-D, @jasonboots, @jcheng5, @jennybc, @JimboMahoney, @johnbde, @jpawlata, @lpevzner, @marcusbrito, @MateusMaiaDS, @mathlete76, @mattle24, @mikegunn, @milamyslov, @MonkmanMH, @nicholailidow, @nikosbosse, @nilescbn, @OmarGonD, @paulvern, @pschloss, @py9mrg, @rhamo, @rhgof, @robitalec, @RussBowdrey, @sacrevert, @sanjmeh, @SebastianJHM, @timothoms, @TMax66, @tomcardoso, @uhhiitsphilia, and @YlanAllouche.


  1. The deprecated sheets_*() functions have been removed, as promised in the warning they have been throwing for over a year. No functionality was lost: this is just the result of the function (re-)naming scheme adopted in googlesheets4 >= 0.2.0. This internal documentation has a table that maps deprecated functions to their current counterparts. ↩︎