Title: | Expand 'connector' Package for 'Databricks' Tables and Volumes |
Version: | 0.1.0 |
Description: | Expands the 'connector' https://github.com/NovoNordisk-OpenSource/connector package and provides a convenient interface for accessing and interacting with 'Databricks' https://www.databricks.com volumes and tables directly from R. |
License: | Apache License (≥ 2) |
URL: | https://novonordisk-opensource.github.io/connector.databricks/, https://github.com/NovoNordisk-OpenSource/connector.databricks |
BugReports: | https://github.com/NovoNordisk-OpenSource/connector.databricks/issues |
Imports: | arrow, brickster (≥ 0.2.7), checkmate, cli, connector (≥ 1.0.0), DBI, dbplyr, dplyr, fs, hms, odbc (≥ 1.4.0), purrr, R6 (≥ 2.4.0), rlang, withr, zephyr |
Suggests: | glue, knitr, mockery (≥ 0.4.4), rmarkdown, testthat (≥ 3.2.3), tibble, whirl (≥ 0.3.0) |
VignetteBuilder: | knitr |
Config/Needs/website: | rmarkdown |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-09-01 08:28:07 UTC; vlob |
Author: | Vladimir Obucina [aut, cre], Steffen Falgreen Larsen [aut], Aksel Thomsen [aut], Cervan Girard [aut], Oliver Lundsgaard [ctb], Skander Mulder [ctb], Novo Nordisk A/S [cph] |
Maintainer: | Vladimir Obucina <vlob@novonordisk.com> |
Depends: | R (≥ 4.1.0) |
Repository: | CRAN |
Date/Publication: | 2025-09-05 12:00:02 UTC |
Connector for connecting to Databricks using DBI
Description
Extension of the connector::connector_dbi making it easier to connect to, and work with tables in Databricks.
Details
All methods for ConnectorDatabricksTable object are working from the
catalog and schema provided when initializing the connection.
This means you only need to provide the table name when using the built in
methods. If you want to access tables outside of the chosen schema, you can
either retrieve the connection with ConnectorDatabricksTable$conn
or create
a new connector.
When creating the connections to Databricks you either need to provide the
sqlpath to Databricks cluster or the SQL warehouse you want to connect to.
Authentication to databricks is handed by the odbc::databricks()
driver and
supports general use of personal access tokens and credentials through Posit
Workbench. See also odbc::databricks()
On more information on how the
connection to Databricks is established.
Super classes
connector::Connector
-> connector::ConnectorDBI
-> ConnectorDatabricksTable
Active bindings
conn
The DBI connection object of the connector
catalog
The catalog used in the connector
schema
The schema used in the connector
Methods
Public methods
Inherited methods
Method new()
Initialize the connection to Databricks
Usage
ConnectorDatabricksTable$new(http_path, catalog, schema, extra_class = NULL)
Arguments
Returns
A ConnectorDatabricksTable object
Method clone()
The objects of this class are cloneable with this method.
Usage
ConnectorDatabricksTable$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Examples
## Not run:
# Establish connection to your cluster
con_databricks <- ConnectorDatabricksTable$new(
http_path = "path-to-cluster",
catalog = "my_catalog",
schema = "my_schema"
)
# List tables in my_schema
con_databricks$list_content()
# Read and write tables
con_databricks$write(mtcars, "my_mtcars_table")
con_databricks$read("my_mtcars_table")
# Use dplyr::tbl
con_databricks$tbl("my_mtcars_table")
# Remove table
con_databricks$remove("my_mtcars_table")
# Disconnect
con_databricks$disconnect()
## End(Not run)
Connector for databricks volume storage
Description
The ConnectorDatabricksVolume class, built on top of connector::connector class. It is a file storage connector for accessing and manipulating files inside Databricks volumes.
Super classes
connector::Connector
-> connector::ConnectorFS
-> ConnectorDatabricksVolume
Active bindings
Methods
Public methods
Inherited methods
connector::Connector$list_content_cnt()
connector::Connector$print()
connector::Connector$read_cnt()
connector::Connector$remove_cnt()
connector::Connector$write_cnt()
connector::ConnectorFS$create_directory_cnt()
connector::ConnectorFS$download_cnt()
connector::ConnectorFS$download_directory_cnt()
connector::ConnectorFS$remove_directory_cnt()
connector::ConnectorFS$tbl_cnt()
connector::ConnectorFS$upload_cnt()
connector::ConnectorFS$upload_directory_cnt()
Method new()
Initializes the connector for Databricks volume storage.
Usage
ConnectorDatabricksVolume$new( full_path = NULL, catalog = NULL, schema = NULL, path = NULL, extra_class = NULL, force = FALSE, ... )
Arguments
full_path
character Full path to the file storage in format
catalog/schema/path
. If NULL,catalog
,schema
, andpath
must be provided.catalog
character Databricks catalog
schema
character Databricks schema
path
character Path to the file storage
extra_class
character Extra class to assign to the new connector.
force
logical If TRUE, the volume will be created without asking if it does not exist.
...
Additional arguments passed to the initialize method of superclass
Returns
A new ConnectorDatabricksVolume object
Method clone()
The objects of this class are cloneable with this method.
Usage
ConnectorDatabricksVolume$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Examples
## Not run:
# Create Volume file storage connector
cnt <- ConnectorDatabricksVolume$new(full_path = "catalog/schema/path")
cnt
# List content
cnt$list_content_cnt()
# Write to the connector
cnt$write_cnt(iris, "iris.rds")
# Check it is there
cnt$list_content_cnt()
# Read the result back
cnt$read_cnt("iris.rds") |>
head()
## End(Not run)
Internal parameters for reuse in functions
Description
Internal parameters for reuse in functions
Arguments
overwrite |
Overwrite existing content if it exists in the connector?. Default: |
verbosity_level |
Verbosity level for functions in connector. See
zephyr::verbosity_level for details.. Default: |
Details
See connector-options-databricks for more information.
Options for connector.databricks
Description
Configuration options for the connector.databricks
overwrite
Overwrite existing content if it exists in the connector?
Default:
FALSE
Option:
connector.databricks.overwrite
Environment:
R_CONNECTOR.DATABRICKS_OVERWRITE
verbosity_level
Verbosity level for functions in connector. See zephyr::verbosity_level for details.
Default:
"verbose"
Option:
connector.databricks.verbosity_level
Environment:
R_CONNECTOR.DATABRICKS_VERBOSITY_LEVEL
Create ConnectorDatabricksTable
connector
Description
Initializes the connector for table type of storage. See ConnectorDatabricksTable for details.
Usage
connector_databricks_table(http_path, catalog, schema, extra_class = NULL)
Arguments
http_path |
character The path to the Databricks cluster or SQL warehouse you want to connect to |
catalog |
character The catalog to use |
schema |
character The schema to use |
extra_class |
character Extra class to assign to the new connector |
Details
The extra_class
parameter allows you to create a subclass of the
ConnectorDatabricksTable
object. This can be useful if you want to create
a custom connection object for easier dispatch of new s3 methods, while still
inheriting the methods from the ConnectorDatabricksTable
object.
Value
A new ConnectorDatabricksTable object
Examples
## Not run:
# Establish connection to your cluster
con_databricks <- connector_databricks_table(
http_path = "path-to-cluster",
catalog = "my_catalog",
schema = "my_schema"
)
# List tables in my_schema
con_databricks$list_content()
# Read and write tables
con_databricks$write(mtcars, "my_mtcars_table")
con_databricks$read("my_mtcars_table")
# Use dplyr::tbl
con_databricks$tbl("my_mtcars_table")
# Remove table
con_databricks$remove("my_mtcars_table")
# Disconnect
con_databricks$disconnect()
## End(Not run)
Create databricks volume connector
Description
Create a new databricks volume connector object. See ConnectorDatabricksVolume for details.
Initializes the connector for Databricks volume storage.
Usage
connector_databricks_volume(
full_path = NULL,
catalog = NULL,
schema = NULL,
path = NULL,
extra_class = NULL,
force = FALSE,
...
)
Arguments
full_path |
Full path to the file storage in format
|
catalog |
Databricks catalog |
schema |
Databricks schema |
path |
Path to the file storage |
extra_class |
Extra class to assign to the new connector. |
force |
If TRUE, the volume will be created without asking if it does not exist. |
... |
Additional arguments passed to the connector::connector |
Details
The extra_class
parameter allows you to create a subclass of the
ConnectorDatabricksVolume
object. This can be useful if you want to create
a custom connection object for easier dispatch of new s3 methods, while still
inheriting the methods from the ConnectorDatabricksVolume
object.
Value
A new ConnectorDatabricksVolume object
Examples
## Not run:
# Connect to a file system
databricks_volume <- "catalog/schema/path"
db <- connector_databricks_volume(databricks_volume)
db
# Create subclass connection
db_subclass <- connector_databricks_volume(databricks_volume,
extra_class = "subclass"
)
db_subclass
class(db_subclass)
## End(Not run)
Create a directory
Description
Additional list content methods for Databricks connectors implemented for
connector::create_directory_cnt()
:
-
ConnectorDatabricksVolume: Reuses the
connector::create_directory_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
create_directory_cnt(connector_object, name, open = TRUE, ...)
## S3 method for class 'ConnectorDatabricksVolume'
create_directory_cnt(connector_object, name, open = TRUE, ...)
Arguments
connector_object |
Connector The connector object to use. |
name |
character The name of the directory to create |
open |
create a new connector object |
... |
ConnectorDatabricksVolume: Additional parameters to pass to the brickster::db_volume_dir_create method |
Value
invisible connector_object.
Disconnect (close) the connection of the connector
Description
Generic implementing of how to disconnect from the relevant connections. Mostly relevant for DBI connectors.
-
ConnectorDBI: Uses
DBI::dbDisconnect()
to create a table reference to close a DBI connection.
Usage
disconnect_cnt(connector_object, ...)
Arguments
connector_object |
Connector The connector object to use. |
... |
Additional arguments passed to the method for the individual connector. |
Value
invisible connector_object.
Download content from the connector
Description
Additional list content methods for Databricks connectors implemented for
connector::download_cnt()
:
-
ConnectorDatabricksVolume: Reuses the
connector::download_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
download_cnt(connector_object, src, dest = basename(src), ...)
## S3 method for class 'ConnectorDatabricksVolume'
download_cnt(connector_object, src, dest = basename(src), ...)
Arguments
connector_object |
Connector The connector object to use. |
src |
character Name of the content to read, write, or remove. Typically the table name. |
dest |
character Path to the file to download to or upload from |
... |
ConnectorDatabricksVolume: Additional parameters to pass to the
|
Value
invisible connector_object.
Download a directory
Description
Additional list content methods for Databricks connectors implemented for
connector::download_directory_cnt()
:
-
ConnectorDatabricksVolume: Reuses the
connector::download_directory_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
download_directory_cnt(connector_object, src, dest = basename(src), ...)
## S3 method for class 'ConnectorDatabricksVolume'
download_directory_cnt(connector_object, src, dest = basename(src), ...)
Arguments
connector_object |
Connector The connector object to use. |
src |
character The name of the directory to download from the connector |
dest |
character Path to the directory to download to |
... |
ConnectorDatabricksVolume: Additional parameters to pass to
the |
Value
invisible connector_object.
List available content from the connector
Description
Additional list content methods for Databricks connectors implemented for
connector::list_content_cnt()
:
-
ConnectorDatabricksTable: Reuses the
connector::list_content_cnt()
method for ConnectorDatabricksTable, but always sets thecatalog
andschema
as defined in when initializing the connector.
-
ConnectorDatabricksVolume: Reuses the
connector::list_content_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
list_content_cnt(connector_object, ...)
## S3 method for class 'ConnectorDatabricksTable'
list_content_cnt(connector_object, ..., tags = NULL)
## S3 method for class 'ConnectorDatabricksVolume'
list_content_cnt(connector_object, ...)
Arguments
connector_object |
Connector The connector object to use. |
... |
ConnectorDatabricksVolume: Additional parameters to pass to the
|
tags |
Expression to be translated to SQL using
|
Value
A character vector of content names
Connector Logging Functions
Description
-
ConnectorDatabricksTable: Implementation of the
log_read_connector
function for the ConnectorDatabricksTable class.
-
ConnectorDatabricksVolume: Implementation of the
log_read_connector
function for the ConnectorDatabricksVolume class.
Additional log read methods for Databricks connectors implemented for
connector::log_read_connector()
:
Usage
## S3 method for class 'ConnectorDatabricksTable'
log_read_connector(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksVolume'
log_read_connector(connector_object, name, ...)
log_read_connector(connector_object, name, ...)
Arguments
connector_object |
The connector object to log operations for. Can be any connector class (ConnectorFS, ConnectorDBI, ConnectorLogger, etc.) |
name |
Character string specifying the name or identifier of the resource being operated on (e.g., file name, table name) |
... |
Additional parameters passed to specific method implementations. May include connector-specific options or metadata. |
Details
Connector Logging Functions
The logging system is built around S3 generic functions that dispatch to specific implementations based on the connector class. Each operation is logged with contextual information including connector details, operation type, and resource names.
Value
These are primarily side-effect functions that perform logging. The actual return value depends on the specific method implementation, typically:
-
log_read_connector
: Result of the read operation -
log_write_connector
: Invisible result of write operation -
log_remove_connector
: Invisible result of remove operation -
log_list_content_connector
: List of connector contents
Connector Logging Functions
Description
-
ConnectorDatabricksTable: Implementation of the
log_remove_connector
function for the ConnectorDatabricksTable class.
-
ConnectorDatabricksVolume: Implementation of the
log_remove_connector
function for the ConnectorDatabricksVolume class.
Additional log remove methods for Databricks connectors implemented for
connector::log_remove_connector()
:
Usage
## S3 method for class 'ConnectorDatabricksTable'
log_remove_connector(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksVolume'
log_remove_connector(connector_object, name, ...)
log_remove_connector(connector_object, name, ...)
Arguments
connector_object |
The connector object to log operations for. Can be any connector class (ConnectorFS, ConnectorDBI, ConnectorLogger, etc.) |
name |
Character string specifying the name or identifier of the resource being operated on (e.g., file name, table name) |
... |
Additional parameters passed to specific method implementations. May include connector-specific options or metadata. |
Details
Connector Logging Functions
The logging system is built around S3 generic functions that dispatch to specific implementations based on the connector class. Each operation is logged with contextual information including connector details, operation type, and resource names.
Value
These are primarily side-effect functions that perform logging. The actual return value depends on the specific method implementation, typically:
-
log_read_connector
: Result of the read operation -
log_write_connector
: Invisible result of write operation -
log_remove_connector
: Invisible result of remove operation -
log_list_content_connector
: List of connector contents
Connector Logging Functions
Description
-
ConnectorDatabricksTable: Implementation of the
log_write_connector
function for the ConnectorDatabricksTable class.
-
ConnectorDatabricksVolume: Implementation of the
log_write_connector
function for the ConnectorDatabricksVolume class.
Additional log write methods for Databricks connectors implemented for
connector::log_write_connector()
:
Usage
## S3 method for class 'ConnectorDatabricksTable'
log_write_connector(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksVolume'
log_write_connector(connector_object, name, ...)
log_write_connector(connector_object, name, ...)
Arguments
connector_object |
The connector object to log operations for. Can be any connector class (ConnectorFS, ConnectorDBI, ConnectorLogger, etc.) |
name |
Character string specifying the name or identifier of the resource being operated on (e.g., file name, table name) |
... |
Additional parameters passed to specific method implementations. May include connector-specific options or metadata. |
Details
Connector Logging Functions
The logging system is built around S3 generic functions that dispatch to specific implementations based on the connector class. Each operation is logged with contextual information including connector details, operation type, and resource names.
Value
These are primarily side-effect functions that perform logging. The actual return value depends on the specific method implementation, typically:
-
log_read_connector
: Result of the read operation -
log_write_connector
: Invisible result of write operation -
log_remove_connector
: Invisible result of remove operation -
log_list_content_connector
: List of connector contents
Read content from the connector
Description
Additional read methods for Databricks connectors implemented for
connector::read_cnt()
:
-
ConnectorDatabricksTable: Reuses the
connector::read_cnt()
method for ConnectorDatabricksTable, but always sets thecatalog
andschema
as defined in when initializing the connector.
-
ConnectorDatabricksVolume: Reuses the
connector::read_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
read_cnt(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksTable'
read_cnt(connector_object, name, ..., timepoint = NULL, version = NULL)
## S3 method for class 'ConnectorDatabricksVolume'
read_cnt(connector_object, name, ...)
Arguments
connector_object |
Connector The connector object to use. |
name |
character Name of the content to read, write, or remove. Typically the table name. |
... |
ConnectorDatabricksVolume: Additional parameters to pass to the
|
timepoint |
Timepoint in Delta time travel syntax format. |
version |
Table version generated by the operation. |
Value
R object with the content. For rectangular data a data.frame.
Remove content from the connector
Description
Additional remove methods for Databricks connectors implemented for
connector::remove_cnt()
:
-
ConnectorDatabricksTable: Reuses the
connector::list_content_cnt()
method for ConnectorDatabricksTable, but always sets thecatalog
andschema
as defined in when initializing the connector.
-
ConnectorDatabricksVolume: Reuses the
connector::remove_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
remove_cnt(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksTable'
remove_cnt(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksVolume'
remove_cnt(connector_object, name, ...)
Arguments
connector_object |
Connector The connector object to use. |
name |
character Name of the content to read, write, or remove. Typically the table name. |
... |
ConnectorDatabricksTable: Additional parameters to pass to the
|
Value
invisible connector_object.
Remove a directory
Description
Additional list content methods for Databricks connectors implemented for
connector::remove_directory_cnt()
:
-
ConnectorDatabricksVolume: Reuses the
connector::remove_directory_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
remove_directory_cnt(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksVolume'
remove_directory_cnt(connector_object, name, ...)
Arguments
connector_object |
Connector The connector object to use. |
name |
character The name of the directory to remove |
... |
ConnectorDatabricksVolume: Additional parameters to pass to
the |
Value
invisible connector_object.
Use dplyr verbs to interact with the remote database table
Description
Additional tbl methods for Databricks connectors implemented for
connector::tbl_cnt()
:
-
ConnectorDatabricksTable: Reuses the
connector::list_content_cnt()
method for ConnectorDatabricksTable, but always sets thecatalog
andschema
as defined in when initializing the connector.
-
ConnectorDatabricksVolume: Reuses the
connector::remove_directory_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector. Usesread_cnt()
to allow redundancy between Volumes and Tables.
Usage
tbl_cnt(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksTable'
tbl_cnt(connector_object, name, ...)
## S3 method for class 'ConnectorDatabricksVolume'
tbl_cnt(connector_object, name, ...)
Arguments
connector_object |
Connector The connector object to use. |
name |
character Name of the content to read, write, or remove. Typically the table name. |
... |
Additional arguments passed to the method for the individual connector. |
Value
A dplyr::tbl object.
Upload content to the connector
Description
Additional list content methods for Databricks connectors implemented for
connector::upload_cnt()
:
-
ConnectorDatabricksVolume: Reuses the
connector::upload_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
upload_cnt(
connector_object,
src,
dest = basename(src),
overwrite = zephyr::get_option("overwrite", "connector"),
...
)
## S3 method for class 'ConnectorDatabricksVolume'
upload_cnt(
connector_object,
src,
dest = basename(src),
overwrite = zephyr::get_option("overwrite", "connector.databricks"),
...
)
Arguments
connector_object |
Connector The connector object to use. |
src |
character Path to the file to download to or upload from |
dest |
character Name of the content to read, write, or remove. Typically the table name. |
overwrite |
Overwrites existing content if it exists in the connector. |
... |
ConnectorDatabricksVolume: Additional parameters to pass to the
|
Value
invisible connector_object.
Upload a directory
Description
Additional list content methods for Databricks connectors implemented for
connector::upload_directory_cnt()
:
-
ConnectorDatabricksVolume: Reuses the
connector::upload_directory_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
upload_directory_cnt(
connector_object,
src,
dest,
overwrite = zephyr::get_option("overwrite", "connector"),
open = FALSE,
...
)
## S3 method for class 'ConnectorDatabricksVolume'
upload_directory_cnt(
connector_object,
src,
dest = basename(src),
overwrite = zephyr::get_option("overwrite", "connector"),
open = FALSE,
...
)
Arguments
connector_object |
Connector The connector object to use. |
src |
character Path to the directory to upload |
dest |
character The name of the new directory to place the content in |
overwrite |
Overwrite existing content if it exists in the connector?
See connector-options for details. Default can be set globally with
|
open |
logical Open the directory as a new connector object. |
... |
ConnectorDatabricksVolume: Additional parameters to pass to
the |
Value
invisible connector_object.
Write content to the connector
Description
Additional write methods for Databricks connectors implemented for
connector::write_cnt()
:
-
ConnectorDatabricksTable: Reuses the
connector::write_cnt()
method for ConnectorDatabricksTable, but always sets thecatalog
andschema
as defined in when initializing the connector. Creates temporary volume to write object as a parquet file and then convert it to a table.
-
ConnectorDatabricksVolume: Reuses the
connector::write_cnt()
method for ConnectorDatabricksVolume, but always sets thecatalog
,schema
andpath
as defined in when initializing the connector.
Usage
write_cnt(
connector_object,
x,
name,
overwrite = zephyr::get_option("overwrite", "connector"),
...
)
## S3 method for class 'ConnectorDatabricksTable'
write_cnt(
connector_object,
x,
name,
overwrite = zephyr::get_option("overwrite", "connector.databricks"),
...,
method = "volume",
tags = NULL
)
## S3 method for class 'ConnectorDatabricksVolume'
write_cnt(
connector_object,
x,
name,
overwrite = zephyr::get_option("overwrite", "connector.databricks"),
...
)
Arguments
connector_object |
Connector The connector object to use. |
x |
The object to write to the connection |
name |
character Name of the content to read, write, or remove. Typically the table name. |
overwrite |
Overwrite existing content if it exists in the connector. |
... |
ConnectorDatabricksVolume: Additional parameters to pass to the
|
method |
|
tags |
|
Value
invisible connector_object.