httr2 1.0.0

  httr, httr2

  Hadley Wickham

We’re delighted to announce the release of httr21 1.0.0. httr2 is the second generation of httr: it helps you generate HTTP requests and process the responses, designed with an eye towards modern web APIs and potentially putting your code in a package.

You can install it from CRAN with:

install.packages("httr2")

httr2 has been under development for the last two years, but this is the first time we’ve blogged about it because we’ve been waiting until the user interface felt stable. It now does, and we’re ready to encourage you to use httr2 whenever you need to talk to a web server. Most importantly httr2 is now a “real” package because it has a wonderful new logo, thanks to a collaborative effort involving Julie Jung, Greg Swineheart, and DALL•E 3.

The new httr2 logo is a dark blue hexagon with httr2 written in bright white at the top of logo. Underneath the text is a vibrant magenta baseball player hitting a ball emblazoned with the letters "www".

httr2 is the successor to httr. The biggest difference is that it has an explicit request object which you can build up over multiple function calls. This makes the interface fit more naturally with the pipe, and generally makes life easier because you can iteratively build up a complex request. httr2 also builds on the 10 years of package development experience we’ve accrued since creating httr, so it should all around be more enjoyable to use. If you’re a current httr user, there’s no need to switch, as we’ll continue to maintain the package for many years to come, but if you start on a new project, I’d recommend that you give httr2 a shot.

If you’ve been following httr2 development for a while, you might want to jump to the release notes to see what’s new (a lot!). The most important change in this release is that Maximilian Girlich is now a httr2 author, in recognition of his many contributions to the package. This release also features improved tools for performing multiple requests (more on that below) and a bunch of bug fixes and minor improvements for OAuth.

For the rest of this blog post, I’ll assume that you’re familiar with the basics of HTTP. If you’re not, you might want to start with vignette("httr2") which introduces you to HTTP using httr2.

Making a request

httr2 is designed around the two big pieces of HTTP: requests and responses. First you’ll create a request, with a URL:

req <- request(example_url())
req
#> <httr2_request>
#> GET http://127.0.0.1:51981/
#> Body: empty

Instead of using an external website, here we’re using a test server that’s built in to httr2. This ensures that this blog post, and many httr2 examples, work independently from the rest of the internet.

You can see the HTTP request that httr2 will send, without actually sending it2, by doing a dry run:

req |> req_dry_run()
#> GET / HTTP/1.1
#> Host: 127.0.0.1:51981
#> User-Agent: httr2/0.2.3.9000 r-curl/5.1.0 libcurl/8.1.2
#> Accept: */*
#> Accept-Encoding: deflate, gzip

As you can see, this request object will perform a simple GET request with automatic user agent and accept headers.

To make more complex requests, you modify the request object with functions that start with req_. For example, you could make it a HEAD request, with some query parameters, and a custom user agent:

req |> 
  req_url_query(param = "value") |> 
  req_user_agent("My user agent") |> 
  req_method("HEAD") |> 
  req_dry_run()
#> HEAD /?param=value HTTP/1.1
#> Host: 127.0.0.1:51981
#> User-Agent: My user agent
#> Accept: */*
#> Accept-Encoding: deflate, gzip

Or you could send some JSON in the body of the request:

req |> 
  req_body_json(list(x = 1, y = "a")) |> 
  req_dry_run()
#> POST / HTTP/1.1
#> Host: 127.0.0.1:51981
#> User-Agent: httr2/0.2.3.9000 r-curl/5.1.0 libcurl/8.1.2
#> Accept: */*
#> Accept-Encoding: deflate, gzip
#> Content-Type: application/json
#> Content-Length: 15
#> 
#> {"x":1,"y":"a"}

httr2 provides a wide range of req_ function to customise the request in common ways; if there’s something you need that httr2 doesn’t support, please file an issue!

Performing the request and handling the response

Once you have a request that you are happy with, you can send it to the server with req_perform():

req_json <- req |> req_url_path("/json")
resp <- req_json |> req_perform()

Performing a request will return a response object (or throw an error, which we’ll talk about next). You can see the basic details of the request by printing it or you can see the raw response with resp_raw()3:

resp
#> <httr2_response>
#> GET http://127.0.0.1:51981/json
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (407 bytes)

resp |> resp_raw()
#> HTTP/1.1 200 OK
#> Connection: close
#> Date: Tue, 14 Nov 2023 14:41:32 GMT
#> Content-Type: application/json
#> Content-Length: 407
#> ETag: "de760e6d"
#> 
#> {
#>   "firstName": "John",
#>   "lastName": "Smith",
#>   "isAlive": true,
#>   "age": 27,
#>   "address": {
#>     "streetAddress": "21 2nd Street",
#>     "city": "New York",
#>     "state": "NY",
#>     "postalCode": "10021-3100"
#>   },
#>   "phoneNumbers": [
#>     {
#>       "type": "home",
#>       "number": "212 555-1234"
#>     },
#>     {
#>       "type": "office",
#>       "number": "646 555-4567"
#>     }
#>   ],
#>   "children": [],
#>   "spouse": null
#> }

But generally, you’ll want to use the resp_ functions to extract parts of the response for further processing. For example, you could parse the JSON body into an R data structure:

resp |> 
  resp_body_json() |> 
  str()
#> List of 8
#>  $ firstName   : chr "John"
#>  $ lastName    : chr "Smith"
#>  $ isAlive     : logi TRUE
#>  $ age         : int 27
#>  $ address     :List of 4
#>   ..$ streetAddress: chr "21 2nd Street"
#>   ..$ city         : chr "New York"
#>   ..$ state        : chr "NY"
#>   ..$ postalCode   : chr "10021-3100"
#>  $ phoneNumbers:List of 2
#>   ..$ :List of 2
#>   .. ..$ type  : chr "home"
#>   .. ..$ number: chr "212 555-1234"
#>   ..$ :List of 2
#>   .. ..$ type  : chr "office"
#>   .. ..$ number: chr "646 555-4567"
#>  $ children    : list()
#>  $ spouse      : NULL

Or get the value of a header:

resp |> resp_header("Content-Length")
#> [1] "407"

Error handling

You can use resp_status() to see the returned status:

resp |> resp_status()
#> [1] 200

But this will almost always be 200, because httr2 automatically follows redirects (statuses in the 300s) and turns HTTP failures (statuses in the 400s and 500s) into R errors. The following example shows what error handling looks like using an example endpoint that returns a response with the status defined in the URL:

req |> 
  req_url_path("/status/404") |> 
  req_perform()
#> Error in `req_perform()`:
#> ! HTTP 404 Not Found.

req |> 
  req_url_path("/status/500") |> 
  req_perform()
#> Error in `req_perform()`:
#> ! HTTP 500 Internal Server Error.

Turning HTTP failures into R errors can make debugging hard, so httr2 provides the last_request() and last_response() helpers which you can use to figure out what went wrong:

last_request()
#> <httr2_request>
#> GET http://127.0.0.1:51981/status/500
#> Body: empty

last_response()
#> <httr2_response>
#> GET http://127.0.0.1:51981/status/500
#> Status: 500 Internal Server Error
#> Content-Type: text/plain
#> Body: None

httr2 provides two other tools to customise error handling:

  • req_error() gives you full control over what responses should be turned into R errors, and allows you to add additional information to the error message.
  • req_retry() helps deal with transient errors, where you need to wait a bit and try again. For example, many APIs are rate limited and will return a 429 status if you have made too many requests.

You can learn more about both of these functions in “ Wrapping APIs” as they are particularly important when creating an R package (or script) that wraps a web API.

Control the request process

There are a number of other req_ functions that don’t directly affect the HTTP request but instead control the overall process of submitting a request and handling the response. These include:

  • req_cache(), which sets up a cache so if repeated requests return the same results, and you can avoid a trip to the server.

  • req_throttle(), which automatically adds a small delay before each request so you can avoid hammering a server with many requests.

  • req_progress(), which adds a progress bar for long downloads or uploads.

  • req_cookie_preserve(), which lets you preserve cookies across requests.

Additionally, httr2 provides rich support for authenticating with OAuth, implementing many more OAuth flows than httr. You’ve probably used OAuth a bunch without knowing what it’s called: you use it when you login to a non-Google website using your Google account, when you give your phone access to your twitter account, or when you login to a streaming app on your smart TV. OAuth is a big, complex, topic, and is documented in “ OAuth".

Multiple requests

httr2 includes three functions to perform multiple requests:

  • req_perform_sequential() takes a list of requests and performs them one at a time.

  • req_perform_parallel() takes a list of requests and performs them in parallel (up to 6 at a time by default). It’s similar to req_perform_sequential(), but is obviously faster, at the expense of potentially hammering a server. It also has some limitations: most importantly it can’t refresh an expired OAuth token and it doesn’t respect req_retry() or req_throttle().

  • req_perform_iterative() takes a single request and a callback function to generate the next request from previous response. It’ll keep going until the callback function returns NULL or max_reqs requests have been performed. This is very useful for paginated APIs that only tell you the URL for the next page.

For example, imagine we wanted to download each person from the Star Wars API. The URLs have a very consistent structure so we can generate a bunch of them, then create the corresponding requests:

urls <- paste0("https://swapi.dev/api/people/", 1:10)
reqs <- lapply(urls, request)

Now I can perform those requests, collecting a list of responses:

resps <- req_perform_sequential(reqs)
#> Iterating ■■■■                              10% | ETA: 40s
#> Iterating ■■■■■■■                           20% | ETA:  3m
#> Iterating ■■■■■■■■■■                        30% | ETA:  2m
#> Iterating ■■■■■■■■■■■■■                     40% | ETA:  1m
#> Iterating ■■■■■■■■■■■■■■■■                  50% | ETA: 46s
#> Iterating ■■■■■■■■■■■■■■■■■■■               60% | ETA: 33s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■            70% | ETA: 22s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■         80% | ETA: 13s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      90% | ETA:  6s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  100% | ETA:  0s

These responses contain their data in a JSON body:

resps |> 
  _[[1]] |> 
  resp_body_json() |> 
  str()
#> List of 16
#>  $ name      : chr "Luke Skywalker"
#>  $ height    : chr "172"
#>  $ mass      : chr "77"
#>  $ hair_color: chr "blond"
#>  $ skin_color: chr "fair"
#>  $ eye_color : chr "blue"
#>  $ birth_year: chr "19BBY"
#>  $ gender    : chr "male"
#>  $ homeworld : chr "https://swapi.dev/api/planets/1/"
#>  $ films     :List of 4
#>   ..$ : chr "https://swapi.dev/api/films/1/"
#>   ..$ : chr "https://swapi.dev/api/films/2/"
#>   ..$ : chr "https://swapi.dev/api/films/3/"
#>   ..$ : chr "https://swapi.dev/api/films/6/"
#>  $ species   : list()
#>  $ vehicles  :List of 2
#>   ..$ : chr "https://swapi.dev/api/vehicles/14/"
#>   ..$ : chr "https://swapi.dev/api/vehicles/30/"
#>  $ starships :List of 2
#>   ..$ : chr "https://swapi.dev/api/starships/12/"
#>   ..$ : chr "https://swapi.dev/api/starships/22/"
#>  $ created   : chr "2014-12-09T13:50:51.644000Z"
#>  $ edited    : chr "2014-12-20T21:17:56.891000Z"
#>  $ url       : chr "https://swapi.dev/api/people/1/"

There’s lots of ways to deal with this sort of data (e.g. for loops or functional programming) but to make life easier, httr2 comes with its own helper, resps_data(). This function takes a callback that retrieves the data for each response, then concatenates all the data into a single object. In this case, we need to wrap resp_body_json() in a list, so we get one list for each person, rather than one list in total:

resps |> 
  resps_data(\(resp) list(resp_body_json(resp))) |> 
  _[1:3] |> 
  str(list.len = 10)
#> List of 3
#>  $ :List of 16
#>   ..$ name      : chr "Luke Skywalker"
#>   ..$ height    : chr "172"
#>   ..$ mass      : chr "77"
#>   ..$ hair_color: chr "blond"
#>   ..$ skin_color: chr "fair"
#>   ..$ eye_color : chr "blue"
#>   ..$ birth_year: chr "19BBY"
#>   ..$ gender    : chr "male"
#>   ..$ homeworld : chr "https://swapi.dev/api/planets/1/"
#>   ..$ films     :List of 4
#>   .. ..$ : chr "https://swapi.dev/api/films/1/"
#>   .. ..$ : chr "https://swapi.dev/api/films/2/"
#>   .. ..$ : chr "https://swapi.dev/api/films/3/"
#>   .. ..$ : chr "https://swapi.dev/api/films/6/"
#>   .. [list output truncated]
#>  $ :List of 16
#>   ..$ name      : chr "C-3PO"
#>   ..$ height    : chr "167"
#>   ..$ mass      : chr "75"
#>   ..$ hair_color: chr "n/a"
#>   ..$ skin_color: chr "gold"
#>   ..$ eye_color : chr "yellow"
#>   ..$ birth_year: chr "112BBY"
#>   ..$ gender    : chr "n/a"
#>   ..$ homeworld : chr "https://swapi.dev/api/planets/1/"
#>   ..$ films     :List of 6
#>   .. ..$ : chr "https://swapi.dev/api/films/1/"
#>   .. ..$ : chr "https://swapi.dev/api/films/2/"
#>   .. ..$ : chr "https://swapi.dev/api/films/3/"
#>   .. ..$ : chr "https://swapi.dev/api/films/4/"
#>   .. ..$ : chr "https://swapi.dev/api/films/5/"
#>   .. ..$ : chr "https://swapi.dev/api/films/6/"
#>   .. [list output truncated]
#>  $ :List of 16
#>   ..$ name      : chr "R2-D2"
#>   ..$ height    : chr "96"
#>   ..$ mass      : chr "32"
#>   ..$ hair_color: chr "n/a"
#>   ..$ skin_color: chr "white, blue"
#>   ..$ eye_color : chr "red"
#>   ..$ birth_year: chr "33BBY"
#>   ..$ gender    : chr "n/a"
#>   ..$ homeworld : chr "https://swapi.dev/api/planets/8/"
#>   ..$ films     :List of 6
#>   .. ..$ : chr "https://swapi.dev/api/films/1/"
#>   .. ..$ : chr "https://swapi.dev/api/films/2/"
#>   .. ..$ : chr "https://swapi.dev/api/films/3/"
#>   .. ..$ : chr "https://swapi.dev/api/films/4/"
#>   .. ..$ : chr "https://swapi.dev/api/films/5/"
#>   .. ..$ : chr "https://swapi.dev/api/films/6/"
#>   .. [list output truncated]

Another option would be to convert each response into a data frame or tibble. That’s a little tricky here because of the nested lists that will need to become list-columns4, so we’ll avoid that challenge here by focussing on the first nine columns:

sw_data <- function(resp) {
  tibble::as_tibble(resp_body_json(resp)[1:9])
}
resps |> resps_data(sw_data)
#> # A tibble: 10 × 9
#>    name           height mass  hair_color skin_color eye_color birth_year gender
#>    <chr>          <chr>  <chr> <chr>      <chr>      <chr>     <chr>      <chr> 
#>  1 Luke Skywalker 172    77    blond      fair       blue      19BBY      male  
#>  2 C-3PO          167    75    n/a        gold       yellow    112BBY     n/a   
#>  3 R2-D2          96     32    n/a        white, bl… red       33BBY      n/a   
#>  4 Darth Vader    202    136   none       white      yellow    41.9BBY    male  
#>  5 Leia Organa    150    49    brown      light      brown     19BBY      female
#>  6 Owen Lars      178    120   brown, gr… light      blue      52BBY      male  
#>  7 Beru Whitesun… 165    75    brown      light      blue      47BBY      female
#>  8 R5-D4          97     32    n/a        white, red red       unknown    n/a   
#>  9 Biggs Darklig… 183    84    black      light      brown     24BBY      male  
#> 10 Obi-Wan Kenobi 182    77    auburn, w… fair       blue-gray 57BBY      male  
#> # ℹ 1 more variable: homeworld <chr>

When you’re performing large numbers of requests, it’s almost inevitable that something will go wrong. By default, all three functions will bubble up errors, causing you to lose all of the work that’s been done so far. You can, however, use the on_error argument to change what happens, either ignoring errors, or returning when you hit the first error. This will changes the return value: instead of a list of responses, the list might now also contain error objects. httr2 provides other helpers to work with this object:

  • resps_successes() filters the list to find the successful responses. You’ll can then pair this with resps_data() to get the data from the successful request.
  • resps_failures() filters the list to find the failed responses. You’ll can then pair this with resps_requests() to find the requests that generated them and figure out what went wrong,.

Acknowledgements

A big thanks to all 87 folks who have helped make httr2 possible!

@allenbaron, @asadow, @atheriel, @boshek, @casa-henrym, @cderv, @colmanhumphrey, @cstjohn810, @cwang23, @DavidRLovell, @DMerch, @dpprdan, @ECOSchulz, @edavidaja, @elipousson, @emmansh, @Enchufa2, @ErdaradunGaztea, @fangzhou-xie, @fh-mthomson, @fkohrt, @flahn, @gregleleu, @guga31bb, @gvelasq, @hadley, @hongooi73, @howardbaek, @jameslairdsmith, @JBGruber, @jchrom, @jemus42, @jennybc, @jimrothstein, @jjesusfilho, @jjfantini, @jl5000, @jonthegeek, @JosiahParry, @judith-bourque, @juliasilge, @kasperwelbers, @kelvindso, @kieran-mace, @KoderKow, @lassehjorthmadsen, @llrs, @lyndon-bird, @m-mohr, @maelle, @maxheld83, @mgirlich, @MichaelChirico, @michaelgfalk, @misea, @MislavSag, @mkoohafkan, @mmuurr, @multimeric, @nbenn, @nclsbarreto, @nealrichardson, @Nelson-Gon, @olivroy, @owenjonesuob, @paul-carteron, @pbulsink, @ramiromagno, @rplati, @rressler, @samterfa, @schnee, @sckott, @sebastian-c, @selesnow, @Shaunson26, @SokolovAnatoliy, @spotrh, @stefanedwards, @taerwin, @vanhry, @wing328, @xinzhuohkust, @yogat3ch, @yogesh-bansal, @yutannihilation, and @zacdav-db.


  1. Pronounced “hitter 2”. ↩︎

  2. Well, technically, it does send the request, just to another test server that returns the request that it received. ↩︎

  3. This is only an approximation. For example, it only shows the final response if there were redirects, and it automatically uncompresses the body if it was compressed. Nevertheless, it’s still pretty useful. ↩︎

  4. To turn these into list-columns, you need to wrap each list in another list, something like is_list <- map_lgl(json, is.list); json[is_list] <- map(json[is_list], list). This ensures that each element has length 1, the invariant for a row in a tibble. ↩︎