googledrive v1.0.0

  tidyverse, googledrive

  Jenny Bryan

Introduction

We’re jazzed to announce the release of googledrive v1.0.0 (https://googledrive.tidyverse.org).

googledrive wraps the Drive REST API v3. The most common file operations are implemented in high-level functions designed for ease of use. You can find, list, create, trash, delete, rename, move, copy, browse, download, share and publish Drive files, including those on Team Drives.

Install googledrive with:

install.packages("googledrive")

The release of version 1.0.0 marks two events:

  • The overall design of googledrive has survived ~2 years on CRAN, with very little need for change. The interface and feature set are fairly stable. googledrive facilitated around 7 million requests to the Drive API in the past month. But also …
  • There are changes in the auth interface that are not backwards compatible.

There is also new functionality that makes it less likely you’ll create multiple files with the same name, without actually meaning to.

Auth from gargle

googledrive’s auth functionality now comes from the gargle package, which provides R infrastructure to work with Google APIs, in general. We’ve just blogged about gargle’s initial release, so check out that post for more details.

We’re adopting gargle for auth in several other packages, such as bigrquery (>= v1.2.0), gmailr (>= v1.0.0 coming soon to CRAN), and googlesheets4 (currently GitHub-only, successor of googlesheets). This makes new token flows available in these packages, such as Application Default Credentials, and makes auth less idiosyncratic.

Auth changes the typical user will notice

If you’ve always let googledrive guide you through auth, here is the one change you will notice:

OAuth2 tokens are now cached at the user level, by default, instead of in .httr-oauth in the current project. We will ask if it’s OK to create a new folder to hold your OAuth tokens. We recommend that you delete any vestigial .httr-oauth files lying around your googledrive projects and re-authorize googledrive, i.e. get a new token, stored in the new way.

The new strategy makes it harder to accidentally push your tokens to the cloud, easier to use multiple Google identities, and easier to share tokens across projects and packages.

Overall, googledrive has gotten more careful about getting your permission to use a cached token. See the gargle vignette Non-interactive auth to learn how to prevent attempts to interact with you, especially the section “I just want my .Rmd to render”.

googledrive also uses a new OAuth “app”, owned by a verified Google Cloud Project entitled “Tidyverse API Packages”, which is the project name you will see on the OAuth consent screen. See our new Privacy Policy for details.

For more advanced users who call drive_auth() directly or who configure auth settings, such as their own OAuth app or API key, see the changelog for more details.

Preventing name clashes

Google Drive doesn’t impose a 1-to-1 relationship between files and filepaths, the way your local file system does. Therefore, when working via the Drive API (instead of in the browser), it’s fairly easy to create multiple Drive files with the same name or filepath, without actually meaning to. This is perfectly valid on Drive, which identifies file by ID, but can be confusing and undesirable for humans. Very few people actually want this:

googledrive v1.0.0 offers some new ways to prevent writing more than one file to the same filepath.

All functions that create a new item or rename/move an existing item have gained an overwrite argument:

  • drive_create() this function is new in v1.0.0
  • drive_cp()
  • drive_mkdir()
  • drive_mv()
  • drive_rename()
  • drive_upload()

The default of overwrite = NA corresponds to the existing behaviour, which does not consider pre-existing files at all. overwrite = TRUE requests to move a pre-existing file at the target filepath to the trash, prior to creating the new item. If 2 or more files are found, an error is thrown, because it’s not clear which one(s) to trash. overwrite = FALSE means the new item will only be created if there is no pre-existing file at that filepath. Existence checks based on filepath (or name) can be expensive. This is why the default is overwrite = NA, in addition to backwards compatibility.

drive_put() is a new convenience wrapper that figures out whether to call drive_upload() or drive_update().

Sometimes you have a file you will repeatedly send to Drive, i.e. the first time you run an analysis, you create the file and, when you re-run it, you update the file. Previously this was hard to express with googledrive.

drive_put() is useful here and refers to the HTTP verb PUT: create the thing if it doesn’t exist or, if it does, replace its contents. A good explanation of PUT is RESTful API Design — PUT vs PATCH.

In pseudo-code, here’s the basic idea of drive_put():

target_filepath <- <determined from arguments `path`, `name`, and `media`>
hits <- <get all Drive files at target_filepath>
if (no hits) {
 drive_upload(media, path, name, type, ..., verbose)
} else if (exactly 1 hit) {
 drive_update(hit, media, ..., verbose)
} else {
 ERROR
}

Shared workflows

The shared use of gargle allows us to create centralized articles for several workflows that can be tricky for useRs:

Thanks!

Thank you to the 41 people who contributed issues, code, and comments to this release:

@abeburnett, @admahood, @alexpghayes, @arendsee, @batpigandme, @benmarwick, @Chanajit, @cowlumbus, @ctlamb, @DavidGarciaEstaun, @dgplaco, @Diego-MX, @dsdaveh, @eeenilsson, @efh0888, @giocomai, @grabear, @hwsamuel, @ianmcook, @jarodmeng, @jennybc, @jimhester, @lohancock, @lotard, @LucyMcGowan, @lukaskawerau, @MariaMetriplica, @Martin-Jung, @medewitt, @njudd, @philmikejones, @prokulski, @RNA-Ninja, @romunov, @sanjmeh, @Serenthia, @shawzhifei, @stapial, @svenhalvorson, @tarunparmar, and @tsmith64