---
title: "Benchmarks"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Benchmarks}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, eval = FALSE, include = FALSE}
# to run the timings in this readme, set the following env variable:
Sys.setenv(ODBC_EVAL_BENCHMARKS = "true")
```
```{r, include = FALSE}
eval_timings <- as.logical(Sys.getenv("ODBC_EVAL_BENCHMARKS", "false"))
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = eval_timings
)
```
The odbc package is often much faster than the existing
[RODBC](https://cran.r-project.org/package=RODBC) and DBI compatible
[RODBCDBI](https://cran.r-project.org/package=RODBCDBI) packages. We'll benchmark writing and reading data from the [nycflights13](https://github.com/tidyverse/nycflights13) package using the three packages.
```{r, include = FALSE}
# The CRAN version of RODBC uses iODBC, so to use unixODBC we need to
# recompile it from source, specifying the odbc manager explicitly:
#
# install.packages("RODBC", type = "source", INSTALL_opts="--configure-args='--with-odbc-manager=odbc'")
#
# see `vignette("develop")` for more details.
```
```{r setup}
library(odbc)
library(RODBC)
library(RODBCDBI)
library(DBI)
library(nycflights13)
```
## Setup
RODBC and RODBCDBI trip up on `flights`' tibble subclass, so we'll convert it to a data frame before passing to each package. Also, the RODBCDBI package does not support writing timestamps, so we remove that column as well.
```{r}
flights$time_hour <- NULL
flights <- as.data.frame(flights)
```
Now, let's configure three driver connections to a Microsoft SQL Server database, one for each package we'll be benchmarking.
```{r, echo = FALSE}
options("connectionObserver" = NULL)
```
```{r}
odbc <- dbConnect(odbc::odbc(), dsn = "MicrosoftSQLServer", uid = "SA", pwd = "BoopBop123!")
rodbc <- RODBC::odbcConnect(dsn = "MicrosoftSQLServer", uid = "SA", pwd = "BoopBop123!")
rodbcdbi <- dbConnect(RODBCDBI::ODBC(), dsn = "MicrosoftSQLServer", user = "SA", password = "BoopBop123!")
```
The above code requires a `dsn` `MicrosoftSQLServer` to have been previously configured; see `vignettes("setup")` to learn more.
## Writing
We'll first benchmark writing the flights dataset, which contains ~300,000 rows and 18 columns, to a database.
The code for odbc and RODBCDBI looks quite similar, as they both use the DBI front-end:
```{r, include = FALSE}
if ("flights" %in% dbListTables(odbc)) {
dbExecute(odbc, "drop table flights")
}
```
```{r}
odbc_write <- system.time(dbWriteTable(odbc, "flights", flights))
odbc_write
```
```{r, include = !eval_timings}
#> user system elapsed
#> 0.883 0.176 8.108
```
```{r, include = FALSE}
if ("flights" %in% dbListTables(odbc)) {
dbExecute(odbc, "drop table flights")
}
```
```{r}
rodbcdbi_write <- system.time(dbWriteTable(rodbcdbi, "flights", flights))
rodbcdbi_write
```
```{r, include = !eval_timings}
#> user system elapsed
#> 8.287 11.107 257.841
```
```{r, include = FALSE}
if ("flights" %in% dbListTables(odbc)) {
dbExecute(odbc, "drop table flights")
}
```
The timings for RODBC and RODBCDBI are quite similar, as they both utilize the RODBC back-end:
```{r}
rodbc_write <- system.time(sqlSave(rodbc, flights, "flights"))
rodbc_write
```
```{r, include = !eval_timings}
#> user system elapsed
#> 8.266 11.023 235.825
```
```{r, include = FALSE}
if ("flights" %in% dbListTables(odbc)) {
dbExecute(odbc, "drop table flights")
}
```
## Reading
```{r, include = FALSE}
if (!"flights" %in% dbListTables(odbc)) {
dbWriteTable(odbc, "flights", flights)
}
```
Again, the syntax for odbc and RODBCDBI is identical, while the timings for RODBCDBI and RODBC are quite similar.
```{r}
odbc_read <- system.time(result <- dbReadTable(odbc, "flights"))
rodbcdbi_read <- system.time(result <- dbReadTable(rodbcdbi, "flights"))
rodbc_read <- system.time(result <- sqlFetch(rodbc, "flights"))
```
```{r}
odbc_read
```
```{r, include = !eval_timings}
#> user system elapsed
#> 0.515 0.024 0.557
```
```{r}
rodbcdbi_read
```
```{r, include = !eval_timings}
#> user system elapsed
#> 1.308 0.035 1.356
```
```{r}
rodbc_read
```
```{r, include = !eval_timings}
#> user system elapsed
#> 1.291 0.033 1.343
```