| Title: | DBI Package for the DuckDB Database Management System |
|---|---|
| Description: | The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector. |
| Authors: | Hannes Mühleisen [aut] (ORCID: <https://orcid.org/0000-0001-8552-0029>), Mark Raasveldt [aut] (ORCID: <https://orcid.org/0000-0001-5005-6844>), Kirill Müller [cre] (ORCID: <https://orcid.org/0000-0002-1416-3412>), Stichting DuckDB Foundation [cph], Apache Software Foundation [cph], PostgreSQL Global Development Group [cph], The Regents of the University of California [cph], Cameron Desrochers [cph], Victor Zverovich [cph], RAD Game Tools [cph], Valve Software [cph], Rich Geldreich [cph], Tenacious Software LLC [cph], The RE2 Authors [cph], Google Inc. [cph], Facebook Inc. [cph], Steven G. Johnson [cph], Jiahao Chen [cph], Tony Kelman [cph], Jonas Fonseca [cph], Lukas Fittl [cph], Salvatore Sanfilippo [cph], Art.sy, Inc. [cph], Oran Agra [cph], Redis Labs, Inc. [cph], Melissa O'Neill [cph], PCG Project contributors [cph] |
| Maintainer: | Kirill Müller <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.5.4.9007 |
| Built: | 2026-07-03 03:03:04 UTC |
| Source: | https://github.com/duckdb/duckdb-r |
This is a SQL backend for dbplyr tailored to take into account DuckDB's possibilities. This mainly follows the backend for PostgreSQL, but contains more mapped functions.
tbl_file() is an experimental variant of dplyr::tbl() to directly access files on disk.
It is safer than dplyr::tbl() because there is no risk of misinterpreting the request,
and paths with special characters are supported.
tbl_function() is an experimental variant of dplyr::tbl()
to create a lazy table from a table-generating function,
useful for reading nonstandard CSV files or other data sources.
It is safer than dplyr::tbl() because there is no risk of misinterpreting the query.
See https://duckdb.org/docs/data/overview for details on data importing functions.
As an alternative, use dplyr::tbl(src, dplyr::sql("SELECT ... FROM ...")) for custom SQL queries.
tbl_query() is deprecated in favor of tbl_function().
Use simulate_duckdb() with lazy_frame()
to see simulated SQL without opening a DuckDB connection.
tbl_file(src = NULL, path, ..., cache = FALSE) tbl_function(src, query, ..., cache = FALSE) tbl_query(src, query, ...) simulate_duckdb(...)tbl_file(src = NULL, path, ..., cache = FALSE) tbl_function(src, query, ..., cache = FALSE) tbl_query(src, query, ...) simulate_duckdb(...)
src |
A duckdb connection object, |
path |
Path to existing Parquet, CSV or JSON file |
... |
Any parameters to be forwarded |
cache |
Enable object cache for Parquet files |
query |
SQL code, omitting the |
library(dplyr, warn.conflicts = FALSE) con <- DBI::dbConnect(duckdb(), path = ":memory:") db <- copy_to(con, data.frame(a = 1:3, b = letters[2:4])) db %>% filter(a > 1) %>% select(b) path <- tempfile(fileext = ".csv") write.csv(data.frame(a = 1:3, b = letters[2:4])) db_csv <- tbl_file(con, path) db_csv %>% summarize(sum_a = sum(a)) db_csv_fun <- tbl_function(con, paste0("read_csv_auto('", path, "')")) db_csv %>% count() DBI::dbDisconnect(con, shutdown = TRUE)library(dplyr, warn.conflicts = FALSE) con <- DBI::dbConnect(duckdb(), path = ":memory:") db <- copy_to(con, data.frame(a = 1:3, b = letters[2:4])) db %>% filter(a > 1) %>% select(b) path <- tempfile(fileext = ".csv") write.csv(data.frame(a = 1:3, b = letters[2:4])) db_csv <- tbl_file(con, path) db_csv %>% summarize(sum_a = sum(a)) db_csv_fun <- tbl_function(con, paste0("read_csv_auto('", path, "')")) db_csv %>% count() DBI::dbDisconnect(con, shutdown = TRUE)
default_conn() returns a default, built-in connection.
default_conn()default_conn()
Currently, the connection is established with duckdb(environment_scan = TRUE)
and dbConnect(timezone_out = "", array = "matrix")
so that data frames are automatically available as tables,
timestamps are returned in the local timezone,
and DuckDB's array type is returned as an R matrix.
The details of how the connection is established are subject to change.
In particular, returning the output as a tibble or other object may be supported
in the future.
This connection is intended for interactive use. There is no way for this or other packages to comprehensively track the state of this connection, so scripts and packages should manage their own connections.
A DuckDB connection object
conn <- default_conn() sql_query("SELECT 42", conn = conn)conn <- default_conn() sql_query("SELECT 42", conn = conn)
duckdb() creates or reuses a database instance.
duckdb_shutdown() shuts down a database instance.
Return an adbcdrivermanager::adbc_driver() for use with Arrow Database
Connectivity via the adbcdrivermanager package.
dbConnect() connects to a database instance.
dbDisconnect() closes a DuckDB database connection.
The associated DuckDB database instance is shut down automatically,
it is no longer necessary to set shutdown = TRUE or to call duckdb_shutdown().
duckdb( dbdir = DBDIR_MEMORY, read_only = FALSE, bigint = "numeric", config = list(), ..., environment_scan = FALSE ) duckdb_shutdown(drv) duckdb_adbc() ## S4 method for signature 'duckdb_driver' dbConnect( drv, dbdir = DBDIR_MEMORY, ..., debug = getOption("duckdb.debug", FALSE), read_only = FALSE, timezone_out = "UTC", tz_out_convert = c("with", "force"), config = list(), bigint = "numeric", array = "none", geometry = "blob", map = "data.frame" ) ## S4 method for signature 'duckdb_connection' dbDisconnect(conn, ..., shutdown = TRUE)duckdb( dbdir = DBDIR_MEMORY, read_only = FALSE, bigint = "numeric", config = list(), ..., environment_scan = FALSE ) duckdb_shutdown(drv) duckdb_adbc() ## S4 method for signature 'duckdb_driver' dbConnect( drv, dbdir = DBDIR_MEMORY, ..., debug = getOption("duckdb.debug", FALSE), read_only = FALSE, timezone_out = "UTC", tz_out_convert = c("with", "force"), config = list(), bigint = "numeric", array = "none", geometry = "blob", map = "data.frame" ) ## S4 method for signature 'duckdb_connection' dbDisconnect(conn, ..., shutdown = TRUE)
dbdir |
Location for database files. Should be a path to an existing
directory in the file system. With the default (or |
read_only |
Set to |
bigint |
How 64-bit integers should be returned. There are two options: |
config |
Named list with DuckDB configuration flags, see https://duckdb.org/docs/configuration/overview#configuration-reference for the possible options. These flags are only applied when the database object is instantiated. Subsequent connections will silently ignore these flags. |
... |
These dots are for future extensions and must be empty. |
environment_scan |
Set to |
drv |
Object returned by |
debug |
Print additional debug information, such as queries. |
timezone_out |
The time zone returned to R, defaults to |
tz_out_convert |
How to convert timestamp columns to the timezone specified
in |
array |
How arrays should be returned. There are two options: |
geometry |
How geometry columns should be returned. There are two options: |
map |
How |
conn |
A |
shutdown |
Unused. The database instance is shut down automatically. |
The behavior of with = "force" at DST transitions depends on how R handles translation from
the underlying time representation to a human-readable format.
If the timestamp is invalid in the target timezone, the resulting value may be NA
or an adjusted time.
duckdb() returns an object of class duckdb_driver.
dbDisconnect() and duckdb_shutdown() are called for their
side effect.
An object of class "adbc_driver"
dbConnect() returns an object of class duckdb_connection.
library(adbcdrivermanager) with_adbc(db <- adbc_database_init(duckdb_adbc()), { as.data.frame(read_adbc(db, "SELECT 1 as one;")) }) drv <- duckdb() con <- dbConnect(drv) dbGetQuery(con, "SELECT 'Hello, world!'") dbDisconnect(con) duckdb_shutdown(drv) # Shorter: con <- dbConnect(duckdb()) dbGetQuery(con, "SELECT 'Hello, world!'") dbDisconnect(con, shutdown = TRUE)library(adbcdrivermanager) with_adbc(db <- adbc_database_init(duckdb_adbc()), { as.data.frame(read_adbc(db, "SELECT 1 as one;")) }) drv <- duckdb() con <- dbConnect(drv) dbGetQuery(con, "SELECT 'Hello, world!'") dbDisconnect(con) duckdb_shutdown(drv) # Shorter: con <- dbConnect(duckdb()) dbGetQuery(con, "SELECT 'Hello, world!'") dbDisconnect(con, shutdown = TRUE)
Directly reads a CSV file into DuckDB, tries to detect and create the correct schema for it. This usually is much faster than reading the data into R and writing it to DuckDB.
duckdb_read_csv( conn, name, files, ..., header = TRUE, na.strings = "", nrow.check = 500, delim = ",", quote = "\"", col.names = NULL, col.types = NULL, lower.case.names = FALSE, sep = delim, transaction = TRUE, temporary = FALSE )duckdb_read_csv( conn, name, files, ..., header = TRUE, na.strings = "", nrow.check = 500, delim = ",", quote = "\"", col.names = NULL, col.types = NULL, lower.case.names = FALSE, sep = delim, transaction = TRUE, temporary = FALSE )
conn |
A DuckDB connection, created by |
name |
The name for the virtual table that is registered or unregistered |
files |
One or more CSV file names, should all have the same structure though |
... |
These dots are for future extensions and must be empty. |
header |
Whether or not the CSV files have a separate header in the first line |
na.strings |
Which strings in the CSV files should be considered to be NULL |
nrow.check |
How many rows should be read from the CSV file to figure out data types |
delim |
Which field separator should be used |
quote |
Which quote character is used for columns in the CSV file |
col.names |
Override the detected or generated column names |
col.types |
Character vector of column types in the same order as col.names, or a named character vector where names are column names and types pairs. Valid types are DuckDB data types, e.g. VARCHAR, DOUBLE, DATE, BIGINT, BOOLEAN, etc. |
lower.case.names |
Transform column names to lower case |
sep |
Alias for delim for compatibility |
transaction |
Should a transaction be used for the entire operation |
temporary |
Set to |
If the table already exists in the database, the csv is appended to it. Otherwise the table is created.
The number of rows in the resulted table, invisibly.
con <- dbConnect(duckdb()) data <- data.frame(a = 1:3, b = letters[1:3]) path <- tempfile(fileext = ".csv") write.csv(data, path, row.names = FALSE) duckdb_read_csv(con, "data", path) dbReadTable(con, "data") dbDisconnect(con) # Providing data types for columns path <- tempfile(fileext = ".csv") write.csv(iris, path, row.names = FALSE) con <- dbConnect(duckdb()) duckdb_read_csv(con, "iris", path, col.types = c( Sepal.Length = "DOUBLE", Sepal.Width = "DOUBLE", Petal.Length = "DOUBLE", Petal.Width = "DOUBLE", Species = "VARCHAR" ) ) dbReadTable(con, "iris") dbDisconnect(con)con <- dbConnect(duckdb()) data <- data.frame(a = 1:3, b = letters[1:3]) path <- tempfile(fileext = ".csv") write.csv(data, path, row.names = FALSE) duckdb_read_csv(con, "data", path) dbReadTable(con, "data") dbDisconnect(con) # Providing data types for columns path <- tempfile(fileext = ".csv") write.csv(iris, path, row.names = FALSE) con <- dbConnect(duckdb()) duckdb_read_csv(con, "iris", path, col.types = c( Sepal.Length = "DOUBLE", Sepal.Width = "DOUBLE", Petal.Length = "DOUBLE", Petal.Width = "DOUBLE", Species = "VARCHAR" ) ) dbReadTable(con, "iris") dbDisconnect(con)
duckdb_register() registers a data frame as a virtual table (view)
in a DuckDB connection.
No data is copied.
duckdb_register(conn, name, df, overwrite = FALSE, experimental = FALSE) duckdb_unregister(conn, name)duckdb_register(conn, name, df, overwrite = FALSE, experimental = FALSE) duckdb_unregister(conn, name)
conn |
A DuckDB connection, created by |
name |
The name for the virtual table that is registered or unregistered |
df |
A |
overwrite |
Should an existing registration be overwritten? |
experimental |
Enable experimental optimizations |
duckdb_unregister() unregisters a previously registered data frame.
These functions are called for their side effect.
con <- dbConnect(duckdb()) data <- data.frame(a = 1:3, b = letters[1:3]) duckdb_register(con, "data", data) dbReadTable(con, "data") duckdb_unregister(con, "data") dbDisconnect(con)con <- dbConnect(duckdb()) data <- data.frame(a = 1:3, b = letters[1:3]) duckdb_register(con, "data", data) dbReadTable(con, "data") duckdb_unregister(con, "data") dbDisconnect(con)
duckdb_register_arrow() registers an Arrow data source as a virtual table (view)
in a DuckDB connection.
No data is copied.
duckdb_register_arrow(conn, name, arrow_scannable, use_async = NULL) duckdb_unregister_arrow(conn, name) duckdb_list_arrow(conn)duckdb_register_arrow(conn, name, arrow_scannable, use_async = NULL) duckdb_unregister_arrow(conn, name) duckdb_list_arrow(conn)
conn |
A DuckDB connection, created by |
name |
The name for the virtual table that is registered or unregistered |
arrow_scannable |
A scannable Arrow-object |
use_async |
Switched to the asynchronous scanner. (deprecated) |
duckdb_unregister_arrow() unregisters a previously registered data frame.
These functions are called for their side effect.
DuckDB writes several distinct kinds of data to the file system. This page catalogs every such location and documents the unified policy the duckdb R package uses to choose them, so that by default nothing is written outside the R session's temporary directory – except the extension cache, which is auto-probed into the writable package library (and falls back to the temporary directory when the library is read-only, as on CRAN).
The functions that configure these locations are documented in
duckdb_storage_config().
The base DuckDB uses to expand a leading ~ and to
derive default sub-locations such as the extension cache. DuckDB setting:
home_directory. The package does not set this: doing so would also
redirect ~ in user SQL (e.g. COPY ... TO '~/out.csv'). Each location
below is pointed at a temporary directory directly instead.
Downloaded *.duckdb_extension files (e.g.
spatial, httpfs, h3). DuckDB setting: extension_directory. A
re-usable cache: a given binary is valid only for the exact DuckDB
version and platform/ABI that downloaded it. By default the cache is the
"library" root (alongside the installed package) when it is writable,
falling back to a tempdir() sub-directory when it is not. See the
marker section for how this is detected.
Persisted credentials under stored_secrets. DuckDB
setting: secret_directory. Set explicitly to a tempdir() location by
default. Configured and migrated with duckdb_storage_config().
Out-of-core intermediates for sorts, hash
joins, and similar operations. DuckDB settings: temp_directory,
max_temp_directory_size. For an in-memory (:memory:) database DuckDB's
own default spills to .tmp in the current working directory, so the
package overrides it with a tempdir() sub-directory by default.
Written only when a path is explicitly
configured (DuckDB settings log_query_path, http_logging_output,
profiling output). They default to off, so there is no location to
default and nothing is written without the user asking. If the user turns
logging on they choose where it goes.
Chosen by the user through the
dbdir argument of duckdb(). The package does not manage these: an
on-disk database and its sidecar files live where dbdir points, and an
in-memory (:memory:) database has none.
Each managed location is resolved through the same ordered chain. The first source that yields a value wins:
an explicit value passed to duckdb() via config (e.g.
config = list(temp_directory = "..."));
the corresponding R option, e.g. getOption("duckdb.temp_directory");
the corresponding environment variable, e.g.
Sys.getenv("DUCKDB_TEMP_DIRECTORY");
a persistent location selected by a marker file (see below);
the default: a per-session sub-directory of tempdir().
The extension cache inserts one extra step before the tempdir() fallback:
if no marker has selected a root, the "library" root is probed at connect
time by writing its marker – the write doubles as the writability test, and
the marker is left in place to record the choice. If the write fails (the
library is read-only) the cache falls back to tempdir(). So the effective
default is "library when writable, else tempdir", with no persistent write
ever attempted where it would fail.
Persisting data across sessions means writing outside tempdir(); a marker
file records the user's consent to do so, once, so it need not be re-granted
on every connection and does not require editing .Rprofile or .Renviron.
Two functions write and relocate these markers – one per kind of state, so
the two can be configured independently – and a third reports the current
state. They are documented in full on duckdb_storage_config():
duckdb_extension_storage(location, ..., migrate = TRUE, conflict = "error") duckdb_secret_storage(location, ..., migrate = TRUE, conflict = "error") duckdb_storage_status()
A *_storage() call writes the marker at location (creating, relocating,
or – with "session" – removing it); duckdb_storage_status() reports
where each kind currently resolves and how it was chosen. There is no ask
argument: calling a *_storage() function is the consent.
location argumentlocation names a root, not a full path. (To point a kind at an arbitrary
directory, use the option or environment variable instead – a marker is only
ever rediscovered in one of the fixed roots below.) The recognized roots are:
"session"tempdir() – the default, and the opt-out: setting it
removes the marker and reverts that kind to a per-session location.
"user"tools::R_user_dir() – R-specific, private to this
package, surviving package upgrades.
"shared"~/.duckdb – shared with the DuckDB CLI and Python
client.
"library"(extensions only) alongside the installed duckdb
package (base::system.file()). It pairs binaries with the build's ABI
but is wiped on every re-install. This is the automatic default for
extensions when the library is writable (see the resolution policy
above): rather than require an explicit opt-in, the package probes it at
connect time and uses it unless the write fails. Not offered for stored
secrets, which always default to "session".
The marker's name and contents make clear it belongs to the R package, so a
user inspecting the directory can tell at a glance what created it. This
matters most in the "shared" root (~/.duckdb), which is also used by the
DuckDB CLI and Python client:
<root>/extensions/.duckdb-r-keep # opts in the extension cache <root>/stored_secrets/.duckdb-r-keep # opts in stored secrets
It is not empty: the package writes a single line of human-readable text describing what the file is and that it is safe to delete. Only the file's presence is significant – the contents are never read back or validated, so editing it has no effect.
Markers are per-kind and live inside each kind's sub-directory, so one root
can persist extensions but not stored secrets, or vice versa. For extensions in a
persistent root, DuckDB's v<version>/<platform>/ sub-paths keep a stale
binary from being loaded into a newer, ABI-incompatible build.
migrate = TRUE moves the already-cached files from the current location to
the new root. conflict decides what happens when a file of the same name
exists at the destination: "error" (the default) aborts and lists the
collisions without moving anything; "ours" lets the files being relocated
win (overwriting the destination); "theirs" keeps the destination files and
drops the colliding sources. Secret migration is folded into
duckdb_secret_storage().
An option or environment variable overrides any marker.
A kind's marker present in more than one root is ambiguous: when a
connection is opened the package emits a message naming the candidates and
falls back to the tempdir() default until the ambiguity is resolved.
The package never ships a marker. The only writes are by
duckdb_extension_storage() / duckdb_secret_storage(), and the
connect-time probe of the "library" root for extensions (which writes the
marker only when the directory is writable).
A marked location that is not writable falls back to the tempdir()
default rather than failing.
A marker selects the location. It is deliberately distinct from the
presence of a cached binary: an extension found under v<version>/<platform>/
governs only validity (whether a re-download is needed), never the choice
of location. This separation prevents a stale leftover binary from silently
resurrecting a store root and reintroducing an ABI mismatch.
| Kind | DuckDB setting | Option / environment variable | Default |
| Home | home_directory |
-- | left untouched (not set) |
| Extensions | extension_directory |
duckdb.extension_directory / DUCKDB_EXTENSION_DIRECTORY |
library if writable, else tempdir() |
| Stored secrets | secret_directory |
duckdb.secret_directory / DUCKDB_SECRET_DIRECTORY |
tempdir() sub-directory (set) |
| Temp/spill | temp_directory |
duckdb.temp_directory / DUCKDB_TEMP_DIRECTORY |
tempdir() sub-directory (set) |
| Logs | log_query_path |
duckdb.log_directory / DUCKDB_LOG_DIRECTORY |
disabled (off) |
"set" means duckdb() sets the value explicitly in the database config. The
home directory is left untouched so that ~ in user SQL keeps its usual
meaning.
When a connection is established and the resolved
extension cache lies inside tempdir(), the package emits an
informational message – at most once every eight hours per session,
including in unattended (non-interactive) runs. It explains that
downloaded extensions will not persist across sessions and how to opt
into a permanent location. It is shown only when the package chose the
location itself; if you set the extension directory (via config, the
option, or the environment variable) the choice is yours and the message
is suppressed.
The first time the extension cache is initialized in the package library (when its marker is written), the package says so once. The marker then persists, so this is effectively once per installation.
Pointing the extension cache at a permanent location (an option, an
environment variable, or config) both keeps the extensions and silences the
message. If you are happy with a temporary cache and only want the reminder
gone, set the location explicitly so it counts as your choice – the simplest
is a config entry on every connection:
con <- dbConnect(duckdb(config = list( extension_directory = file.path(tempdir(), "duckdb", "extensions") )))
or set it once per session with
options(duckdb.extension_directory = file.path(tempdir(), "duckdb", "extensions"))
(or the DUCKDB_EXTENSION_DIRECTORY environment variable).
duckdb_storage_config() for the functions that configure these
locations, and duckdb() for the config argument.
Choose where the duckdb R package keeps downloaded extensions and persisted secrets, by writing a small marker file that records the choice:
duckdb_extension_storage() – set or move the extension cache (default:
the package library when writable, otherwise a per-session temporary
directory).
duckdb_secret_storage() – set or move the secret store (default: a
per-session temporary directory).
duckdb_storage_status() – report where each currently resolves.
These functions move the cache and secret store to a location that survives across sessions; the same locations can also be set without a marker by overriding with options and environment variables. The full policy is documented in duckdb_storage.
duckdb_extension_storage( location = c("session", "user", "shared", "library"), ..., migrate = TRUE, conflict = "error" ) duckdb_secret_storage( location = c("session", "user", "shared"), ..., migrate = TRUE, conflict = "error" ) duckdb_storage_status()duckdb_extension_storage( location = c("session", "user", "shared", "library"), ..., migrate = TRUE, conflict = "error" ) duckdb_secret_storage( location = c("session", "user", "shared"), ..., migrate = TRUE, conflict = "error" ) duckdb_storage_status()
location |
The destination root (not a path), one of:
To use an arbitrary directory, set the option or environment variable instead (see duckdb_storage). |
... |
These dots are for future extensions and must be empty. |
migrate |
If |
conflict |
How to resolve a name collision during migration: |
duckdb_extension_storage() and duckdb_secret_storage() write (or remove)
the marker for that one kind of state, so the two can be configured
independently. duckdb_storage_status() reports where each kind currently
resolves and which tier of the resolution policy chose it. The new location
takes effect for connections opened afterwards; existing connections are
unaffected.
There is no ask argument: calling a *_storage() function is itself the
consent to write outside the temporary directory.
The *_storage() functions are called for their side effect (writing
or removing a marker, and optionally migrating files) and return the
resolved directory invisibly. duckdb_storage_status() returns a data frame
(class "duckdb_storage_status") with one row per kind of state and columns
kind, source, and directory; its print method renders a readable
summary when the result is auto-printed.
duckdb_storage for the storage policy these functions implement.
sql_query() runs an arbitrary SQL query using DBI::dbGetQuery()
and returns a data.frame with the query results.
sql_exec() runs an arbitrary SQL statement using DBI::dbExecute()
and returns the number of affected rows.
These functions are intended as an easy way to interactively run DuckDB without having to manage connections. By default, data frame objects are available as views.
Scripts and packages should manage their own connections and prefer the DBI methods for more control.
sql_query(sql, conn = default_conn()) sql_exec(sql, conn = default_conn())sql_query(sql, conn = default_conn()) sql_exec(sql, conn = default_conn())
sql |
A SQL string |
conn |
An optional connection, defaults to |
A data frame with the query result
# Queries sql_query("SELECT 42") # Statements with side effects sql_exec("CREATE TABLE test (a INTEGER, b VARCHAR)") sql_exec("INSERT INTO test VALUES (1, 'one'), (2, 'two')") sql_query("FROM test") # Data frames available as views sql_query("FROM mtcars")# Queries sql_query("SELECT 42") # Statements with side effects sql_exec("CREATE TABLE test (a INTEGER, b VARCHAR)") sql_exec("INSERT INTO test VALUES (1, 'one'), (2, 'two')") sql_query("FROM test") # Data frames available as views sql_query("FROM mtcars")