httr2

The goal of this document is show you the basics of httr2. You’ll learn how to create and submit HTTP requests and work with the HTTP responses that you get back. httr2 is designed to map closely to the underlying HTTP protocol, which I’ll explain as we go along. For more details, I also recommend “An overview of HTTP” from MDN.

library(httr2)

Create a request

In httr2, you start by creating a request. If you’re familiar with httr, this a big change: with httr you could only submit a request, immediately receiving a response. Having an explicit request object makes it easier to build up a complex request piece by piece and works well with the pipe.

Every request starts with a URL:

req <- request(example_url())
req
#> <httr2_request>
#> GET http://127.0.0.1:33429/
#> Body: empty

Here, instead of an external website, we use a test server that’s built-in to httr2 itself. That ensures that this vignette will work regardless of when or where you run it.

We can see exactly what this request will send to the server with a dry run:

req |> req_dry_run()
#> GET / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip, br, zstd

The first line of the request contains three important pieces of information:

  • The HTTP method, which is a verb that tells the server what you want to do. Here’s its GET, the most common verb, indicating that we want to get a resource. Other verbs include POST, to create a new resource, PUT, to replace an existing resource, and DELETE, to delete a resource.

  • The path, which is the URL stripped of details that the server already knows, i.e. the protocol (http or https), the host (localhost), and the port (33429).

  • The version of the HTTP protocol. This is unimportant for our purposes because it’s handled at a lower level.

The following lines specify the HTTP headers, a series of name-value pairs separated by :. The headers in this request were automatically added by httr2, but you can override them or add your own with req_headers():

req |>
  req_headers(
    Name = "Hadley",
    `Shoe-Size` = "11",
    Accept = "application/json"
  ) |>
  req_dry_run()
#> GET / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept-Encoding: deflate, gzip, br, zstd
#> Name: Hadley
#> Shoe-Size: 11
#> Accept: application/json

Header names are case-insensitive, and servers will ignore headers that they don’t understand.

The headers finish with a blank line which is followed by the body. The requests above (like all GET requests) don’t have a body, so let’s add one to see what happens. The req_body_*() functions provide a variety of ways to add data to the body. Here we’ll use req_body_json() to add some data encoded as JSON:

req |>
  req_body_json(list(x = 1, y = "a")) |>
  req_dry_run()
#> POST / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip, br, zstd
#> Content-Type: application/json
#> Content-Length: 15
#> 
#> {"x":1,"y":"a"}

What’s changed?

  • The method has changed from GET to POST. POST is the standard method for sending data to a website, and is automatically used whenever you add a body. Use req_method() to for a different method.

  • There are two new headers: Content-Type and Content-Length. They tell the server how to interpret the body — it’s encoded as JSON and is 15 bytes long.

  • We have a body, consisting of some JSON.

Different servers want data encoded differently so httr2 provides a selection of common formats. For example, req_body_form() uses the encoding used when you submit a form from a web browser:

req |>
  req_body_form(x = "1", y = "a") |>
  req_dry_run()
#> POST / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip, br, zstd
#> Content-Type: application/x-www-form-urlencoded
#> Content-Length: 7
#> 
#> x=1&y=a

And req_body_multipart() uses the multipart encoding which is particularly important when you need to send larger amounts of data or complete files:

req |>
  req_body_multipart(x = "1", y = "a") |>
  req_dry_run()
#> POST / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip, br, zstd
#> Content-Length: 246
#> Content-Type: multipart/form-data; boundary=------------------------gfsMlNqnoJIw3q5YLHVyPY
#> 
#> --------------------------gfsMlNqnoJIw3q5YLHVyPY
#> Content-Disposition: form-data; name="x"
#> 
#> 1
#> --------------------------gfsMlNqnoJIw3q5YLHVyPY
#> Content-Disposition: form-data; name="y"
#> 
#> a
#> --------------------------gfsMlNqnoJIw3q5YLHVyPY--

If you need to send data encoded in a different form, you can use req_body_raw() to add the data to the body and set the Content-Type header.

Perform a request and fetch the response

To actually perform a request and fetch the response back from the server, call req_perform():

req <- request(example_url()) |> req_url_path("/json")
resp <- req |> req_perform()
resp
#> <httr2_response>
#> GET http://127.0.0.1:33429/json
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (407 bytes)

You can see a simulation of what httr2 actually received with resp_raw():

resp |> resp_raw()
#> HTTP/1.1 200 OK
#> Connection: close
#> Date: Tue, 29 Oct 2024 22:14:54 GMT
#> Content-Type: application/json
#> Content-Length: 407
#> ETag: "de760e6d"
#> 
#> {
#>   "firstName": "John",
#>   "lastName": "Smith",
#>   "isAlive": true,
#>   "age": 27,
#>   "address": {
#>     "streetAddress": "21 2nd Street",
#>     "city": "New York",
#>     "state": "NY",
#>     "postalCode": "10021-3100"
#>   },
#>   "phoneNumbers": [
#>     {
#>       "type": "home",
#>       "number": "212 555-1234"
#>     },
#>     {
#>       "type": "office",
#>       "number": "646 555-4567"
#>     }
#>   ],
#>   "children": [],
#>   "spouse": null
#> }

An HTTP response has a very similar structure to an HTTP request. The first line gives the version of HTTP used, and a status code that’s optionally followed by a short description. Then we have the headers, followed by a blank line, followed by a body. The majority of responses will have a body, unlike requests.

You can extract data from the response using the resp_() functions:

  • resp_status() returns the status code and resp_status_desc() returns the description:

    resp |> resp_status()
    #> [1] 200
    resp |> resp_status_desc()
    #> [1] "OK"
  • You can extract all headers with resp_headers() or a specific header with resp_header():

    resp |> resp_headers()
    #> <httr2_headers>
    #> Connection: close
    #> Date: Tue, 29 Oct 2024 22:14:54 GMT
    #> Content-Type: application/json
    #> Content-Length: 407
    #> ETag: "de760e6d"
    resp |> resp_header("Content-Length")
    #> [1] "407"

    Headers are case insensitive:

    resp |> resp_header("ConTEnT-LeNgTH")
    #> [1] "407"
  • You can extract the body in various forms using the resp_body_*() family of functions. Since this response returns JSON we can use resp_body_json():

    resp |> resp_body_json() |> str()
    #> List of 8
    #>  $ firstName   : chr "John"
    #>  $ lastName    : chr "Smith"
    #>  $ isAlive     : logi TRUE
    #>  $ age         : int 27
    #>  $ address     :List of 4
    #>   ..$ streetAddress: chr "21 2nd Street"
    #>   ..$ city         : chr "New York"
    #>   ..$ state        : chr "NY"
    #>   ..$ postalCode   : chr "10021-3100"
    #>  $ phoneNumbers:List of 2
    #>   ..$ :List of 2
    #>   .. ..$ type  : chr "home"
    #>   .. ..$ number: chr "212 555-1234"
    #>   ..$ :List of 2
    #>   .. ..$ type  : chr "office"
    #>   .. ..$ number: chr "646 555-4567"
    #>  $ children    : list()
    #>  $ spouse      : NULL

Responses with status codes 4xx and 5xx are HTTP errors. httr2 automatically turns these into R errors:

request(example_url()) |>
  req_url_path("/status/404") |>
  req_perform()
#> Error in `req_perform()`:
#> ! HTTP 404 Not Found.

request(example_url()) |>
  req_url_path("/status/500") |>
  req_perform()
#> Error in `req_perform()`:
#> ! HTTP 500 Internal Server Error.

This is another important difference to httr, which required that you explicitly call httr::stop_for_status() to turn HTTP errors into R errors. You can revert to the httr behaviour with req_error(req, is_error = ~ FALSE).

Control the request process

A number of req_ functions don’t directly affect the HTTP request but instead control the overall process of submitting a request and handling the response. These include:

  • req_cache() sets up a cache so if repeated requests return the same results, you can avoid a trip to the server.

  • req_throttle() will automatically add a small delay before each request so you can avoid hammering a server with many requests.

  • req_retry() sets up a retry strategy so that if the request either fails or you get a transient HTTP error, it’ll automatically retry after a short delay.

For more details see their documentation, as well as examples of the usage in real APIs in vignette("wrapping-apis").