The goal of this document is show you the basics of httr2. You’ll learn how to create and submit HTTP requests and work with the HTTP responses that you get back. httr2 is designed to map closely to the underlying HTTP protocol, which I’ll explain as we go along. For more details, I also recommend “An overview of HTTP” from MDN.
In httr2, you start by creating a request. If you’re familiar with httr, this a big change: with httr you could only submit a request, immediately receiving a response. Having an explicit request object makes it easier to build up a complex request piece by piece and works well with the pipe.
Every request starts with a URL:
Here, instead of an external website, we use a test server that’s built-in to httr2 itself. That ensures that this vignette will work regardless of when or where you run it.
We can see exactly what this request will send to the server with a dry run:
req |> req_dry_run()
#> GET / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip, br, zstd
The first line of the request contains three important pieces of information:
The HTTP method, which is a verb that tells the server what you want to do. Here’s its GET, the most common verb, indicating that we want to get a resource. Other verbs include POST, to create a new resource, PUT, to replace an existing resource, and DELETE, to delete a resource.
The path, which is the URL stripped of details
that the server already knows, i.e. the protocol (http
or
https
), the host (localhost
), and the port
(33429
).
The version of the HTTP protocol. This is unimportant for our purposes because it’s handled at a lower level.
The following lines specify the HTTP headers, a
series of name-value pairs separated by :
. The headers in
this request were automatically added by httr2, but you can override
them or add your own with req_headers()
:
req |>
req_headers(
Name = "Hadley",
`Shoe-Size` = "11",
Accept = "application/json"
) |>
req_dry_run()
#> GET / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept-Encoding: deflate, gzip, br, zstd
#> Name: Hadley
#> Shoe-Size: 11
#> Accept: application/json
Header names are case-insensitive, and servers will ignore headers that they don’t understand.
The headers finish with a blank line which is followed by the
body. The requests above (like all GET requests) don’t
have a body, so let’s add one to see what happens. The
req_body_*()
functions provide a variety of ways to add
data to the body. Here we’ll use req_body_json()
to add
some data encoded as JSON:
req |>
req_body_json(list(x = 1, y = "a")) |>
req_dry_run()
#> POST / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip, br, zstd
#> Content-Type: application/json
#> Content-Length: 15
#>
#> {"x":1,"y":"a"}
What’s changed?
The method has changed from GET to POST. POST is the standard
method for sending data to a website, and is automatically used whenever
you add a body. Use req_method()
to for a different
method.
There are two new headers: Content-Type
and
Content-Length
. They tell the server how to interpret the
body — it’s encoded as JSON and is 15 bytes long.
We have a body, consisting of some JSON.
Different servers want data encoded differently so httr2 provides a
selection of common formats. For example, req_body_form()
uses the encoding used when you submit a form from a web browser:
req |>
req_body_form(x = "1", y = "a") |>
req_dry_run()
#> POST / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip, br, zstd
#> Content-Type: application/x-www-form-urlencoded
#> Content-Length: 7
#>
#> x=1&y=a
And req_body_multipart()
uses the multipart encoding
which is particularly important when you need to send larger amounts of
data or complete files:
req |>
req_body_multipart(x = "1", y = "a") |>
req_dry_run()
#> POST / HTTP/1.1
#> Host: 127.0.0.1:33429
#> User-Agent: httr2/1.0.5.9000 r-curl/6.0.0 libcurl/8.5.0
#> Accept: */*
#> Accept-Encoding: deflate, gzip, br, zstd
#> Content-Length: 246
#> Content-Type: multipart/form-data; boundary=------------------------gfsMlNqnoJIw3q5YLHVyPY
#>
#> --------------------------gfsMlNqnoJIw3q5YLHVyPY
#> Content-Disposition: form-data; name="x"
#>
#> 1
#> --------------------------gfsMlNqnoJIw3q5YLHVyPY
#> Content-Disposition: form-data; name="y"
#>
#> a
#> --------------------------gfsMlNqnoJIw3q5YLHVyPY--
If you need to send data encoded in a different form, you can use
req_body_raw()
to add the data to the body and set the
Content-Type
header.
To actually perform a request and fetch the response back from the
server, call req_perform()
:
req <- request(example_url()) |> req_url_path("/json")
resp <- req |> req_perform()
resp
#> <httr2_response>
#> GET http://127.0.0.1:33429/json
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (407 bytes)
You can see a simulation of what httr2 actually received with
resp_raw()
:
resp |> resp_raw()
#> HTTP/1.1 200 OK
#> Connection: close
#> Date: Tue, 29 Oct 2024 22:14:54 GMT
#> Content-Type: application/json
#> Content-Length: 407
#> ETag: "de760e6d"
#>
#> {
#> "firstName": "John",
#> "lastName": "Smith",
#> "isAlive": true,
#> "age": 27,
#> "address": {
#> "streetAddress": "21 2nd Street",
#> "city": "New York",
#> "state": "NY",
#> "postalCode": "10021-3100"
#> },
#> "phoneNumbers": [
#> {
#> "type": "home",
#> "number": "212 555-1234"
#> },
#> {
#> "type": "office",
#> "number": "646 555-4567"
#> }
#> ],
#> "children": [],
#> "spouse": null
#> }
An HTTP response has a very similar structure to an HTTP request. The first line gives the version of HTTP used, and a status code that’s optionally followed by a short description. Then we have the headers, followed by a blank line, followed by a body. The majority of responses will have a body, unlike requests.
You can extract data from the response using the resp_()
functions:
resp_status()
returns the status code and
resp_status_desc()
returns the description:
You can extract all headers with resp_headers()
or a
specific header with resp_header()
:
resp |> resp_headers()
#> <httr2_headers>
#> Connection: close
#> Date: Tue, 29 Oct 2024 22:14:54 GMT
#> Content-Type: application/json
#> Content-Length: 407
#> ETag: "de760e6d"
resp |> resp_header("Content-Length")
#> [1] "407"
Headers are case insensitive:
You can extract the body in various forms using the
resp_body_*()
family of functions. Since this response
returns JSON we can use resp_body_json()
:
resp |> resp_body_json() |> str()
#> List of 8
#> $ firstName : chr "John"
#> $ lastName : chr "Smith"
#> $ isAlive : logi TRUE
#> $ age : int 27
#> $ address :List of 4
#> ..$ streetAddress: chr "21 2nd Street"
#> ..$ city : chr "New York"
#> ..$ state : chr "NY"
#> ..$ postalCode : chr "10021-3100"
#> $ phoneNumbers:List of 2
#> ..$ :List of 2
#> .. ..$ type : chr "home"
#> .. ..$ number: chr "212 555-1234"
#> ..$ :List of 2
#> .. ..$ type : chr "office"
#> .. ..$ number: chr "646 555-4567"
#> $ children : list()
#> $ spouse : NULL
Responses with status codes 4xx and 5xx are HTTP errors. httr2 automatically turns these into R errors:
request(example_url()) |>
req_url_path("/status/404") |>
req_perform()
#> Error in `req_perform()`:
#> ! HTTP 404 Not Found.
request(example_url()) |>
req_url_path("/status/500") |>
req_perform()
#> Error in `req_perform()`:
#> ! HTTP 500 Internal Server Error.
This is another important difference to httr, which required that you
explicitly call httr::stop_for_status()
to turn HTTP errors
into R errors. You can revert to the httr behaviour with
req_error(req, is_error = ~ FALSE)
.
A number of req_
functions don’t directly affect the
HTTP request but instead control the overall process of submitting a
request and handling the response. These include:
req_cache()
sets up a cache so if repeated requests
return the same results, you can avoid a trip to the server.
req_throttle()
will automatically add a small delay
before each request so you can avoid hammering a server with many
requests.
req_retry()
sets up a retry strategy so that if the
request either fails or you get a transient HTTP error, it’ll
automatically retry after a short delay.
For more details see their documentation, as well as examples of the
usage in real APIs in vignette("wrapping-apis")
.