Title: | A Metadata-Driven Framework for Streamlining Database Joins |
Version: | 0.1.0 |
Description: | Simplifies and automates the process of exploring and merging data from relational databases. This package allows users to discover table relationships, create a map of all possible joins, and generate executable plans to merge data based on a structured metadata framework. |
URL: | https://github.com/akshat09867/DBmaps |
BugReports: | https://github.com/akshat09867/DBmaps/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0), DiagrammeR |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr |
Imports: | data.table |
Depends: | R (≥ 3.5) |
LazyData: | true |
NeedsCompilation: | no |
Packaged: | 2025-09-03 06:25:17 UTC; akshat |
Author: | Akshat Maurya [aut, cre], David Shilane [aut] |
Maintainer: | Akshat Maurya <codingmaster902@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-09-08 19:20:01 UTC |
Add a Table's Metadata to a Registry
Description
A generic function to add new table metadata to a registry object.
Usage
add_table(registry, table_metadata)
Arguments
registry |
The registry object to which metadata will be added. |
table_metadata |
A data.table object created by |
Value
The updated registry object.
Create a Plan for Aggregating and Merging Tables
Description
This function acts as a "planner." It takes a user's request for a final dataset, finds a path using a join map, and creates a structured plan (or "recipe") of the necessary steps.
Usage
create_join_plan(
base_table,
selections,
metadata_dt,
join_map = NULL,
tables_dis = NULL
)
Arguments
base_table |
A character string specifying the main table. |
selections |
A named list specifying the columns or aggregations to include. |
metadata_dt |
The master metadata data.table. |
join_map |
An optional "Join Map" data.table produced by |
tables_dis |
An optional named list of data.tables used for data‑driven (inferred) join discovery. If |
Value
A list object representing the "join plan."
Examples
# --- 1. Define Metadata (Prerequisite) ---
customers_meta <- table_info(
table_name = "customers",
source_identifier = "customers.csv",
identifier_columns = "customer_id",
key_outcome_specs = list(
list(OutcomeName = "CustomerCount", ValueExpression = 1, AggregationMethods = list(
list(AggregatedName = "CountByRegion", AggregationFunction = "sum",
GroupingVariables = "region")
))
)
)
transactions_meta <- table_info(
"transactions", "t.csv", "tx_id",
key_outcome_specs = list(list(OutcomeName = "Revenue", ValueExpression = quote(r),
AggregationMethods = list(list(AggregatedName = "RevenueByCustomer",
AggregationFunction = "sum", GroupingVariables = "customer_id"))))
)
master_metadata <- data.table::rbindlist(list(customers_meta, transactions_meta))
# --- 2. Define the Desired Output ---
user_selections <- list(
customers = "region",
transactions = "RevenueByCustomer"
)
# --- 3. Create the Join Plan WITHOUT providing the join_map ---
# The function will now generate it automatically.
join_plan <- create_join_plan(
base_table = "customers",
selections = user_selections,
metadata_dt = master_metadata
)
# --- 4. Inspect the Plan ---
str(join_plan)
Create a Metadata Registry
Description
Initializes an empty data.table with a custom class "MetadataRegistry" to store and manage metadata definitions.
Usage
create_metadata_registry()
Value
An empty data.table with the class "MetadataRegistry".
Sample Customer Data
Description
A sample dataset containing demographic information for customers included with the DBmaps package.
Usage
customers
Format
A data.table with 5 variables:
- customer_id
A unique identifier for each customer.
- age
The age of the customer in years.
- gender
The gender of the customer.
- income
The income level of the customer.
- region
The geographical region where the customer resides.
Source
Generated for package examples.
Execute a Join Plan
Description
Takes a plan generated by create_join_plan()
and executes it sequentially
to produce a final, merged data.table.
Usage
execute_join_plan(join_plan, data_list)
Arguments
join_plan |
A data.table created by |
data_list |
A named list of the source data.tables. |
Value
A final, merged data.table.
Generate data.table Aggregation Code from Metadata
Description
Reads metadata from a master data.table and generates executable data.table code strings for performing aggregations.
Usage
generate_aggregation_code(table_name_filter, metadata_dt)
Arguments
table_name_filter |
Character string, the name of the table for which to generate aggregation code. |
metadata_dt |
A data.table containing the master metadata, created by
calling |
Value
A named character vector where each element is a runnable
data.table
code string, and the names correspond to the grouping variables.
Examples
# First, create some metadata
customers_info <- table_info(
table_name = "customers",
source_identifier = "customers.csv",
identifier_columns = "customer_id",
key_outcome_specs = list(
list(OutcomeName = "CustomerCount", ValueExpression = 1, AggregationMethods = list(
list(AggregatedName = "CountByRegion", AggregationFunction = "sum",
GroupingVariables = "region")
))
))
transactions_info <- table_info(
table_name = "transactions",
source_identifier = "transactions.csv",
identifier_columns = "transaction_id",
key_outcome_specs = list(
list(OutcomeName = "Revenue", ValueExpression = quote(amount), AggregationMethods = list(
list(AggregatedName = "RevenueByCustomer", AggregationFunction = "sum",
GroupingVariables = "customer_id"),
list(AggregatedName = "RevenueByProduct", AggregationFunction = "sum",
GroupingVariables = "product_id")
)),
list(OutcomeName = "Transactions", ValueExpression = 1, AggregationMethods = list(
list(AggregatedName = "TransactionsByCustomer", AggregationFunction = "sum",
GroupingVariables = "customer_id")
))
))
master_metadata <- data.table::rbindlist(list(customers_info, transactions_info))
# Now, generate the code for the "transactions" table
generated_code <- generate_aggregation_code("transactions", master_metadata)
print(generated_code)
# To demonstrate execution:
# 1. Create the sample data
transactions <- data.table::data.table(
transaction_id = c("T001", "T002", "T003"),
customer_id = c("C001", "C002", "C001"),
product_id = c("P001", "P002", "P001"),
amount = c(10.0, 20.0, 15.0)
)
# 2. Parse and evaluate the first generated statement
revenue_by_customer_code <- generated_code["customer_id"]
cat("Executing code:\n", revenue_by_customer_code)
revenue_by_customer_dt <- eval(parse(text = revenue_by_customer_code))
print(revenue_by_customer_dt)
Discover Potential Join Paths from Metadata and Data
Description
Analyzes metadata for explicit joins and optionally scans data to infer additional joins. Handles single- and multi-variable join keys.
Usage
map_join_paths(metadata_dt, data_list = NULL)
Arguments
metadata_dt |
A data.table containing the master metadata. |
data_list |
A named list of data.tables (names match |
Value
A data.table
representing the "Join Map" with columns:
table_from
, table_to
, key_from
, key_to
Plot a Join Plan as a Flowchart
Description
Takes a plan generated by create_join_plan()
and creates a flowchart
visualizing the sequence of aggregations and merges.
Usage
plot_join_plan(join_plan)
Arguments
join_plan |
A |
Value
A DiagrammeR
graph object that can be printed to the RStudio
Viewer pane.
Sample Product Data
Description
A sample dataset containing product information included with the DBmaps package.
Usage
products
Format
A data.table with 3 variables:
- product_id
A unique identifier for each product.
- category
The category to which the product belongs.
- original_price
The original price of the product.
Source
Generated for package examples.
Define Metadata for a Data Table in a Tidy data.table
Description
Takes descriptive information about a table and returns a tidy data.table.
Usage
table_info(
table_name,
source_identifier,
identifier_columns,
key_outcome_specs
)
Arguments
table_name |
Character string, the conceptual name of the table. |
source_identifier |
Character string, the file name or DB table identifier. |
identifier_columns |
Character vector, names of column(s) acting as primary key(s). |
key_outcome_specs |
A list of 'OutcomeSpec' lists. |
Value
A tidy data.table with the table's metadata. The identifier_columns
and
grouping_variables
columns are list-columns.
Examples
transactions_info <- table_info(
table_name = "transactions",
source_identifier = "transactions.csv",
identifier_columns = c("customer_id", "product_id", "time"),
key_outcome_specs = list(
list(
OutcomeName = "Revenue",
ValueExpression = quote(price * quantity),
AggregationMethods = list(
list(AggregatedName = "RevenueByCustomer", AggregationFunction = "sum",
GroupingVariables = "customer_id"),
list(AggregatedName = "RevenueByProduct", AggregationFunction = "sum",
GroupingVariables = "product_id")
)
)
)
)
# Note the structure of the list-columns
print(transactions_info)
str(transactions_info[, .(identifier_columns, grouping_variable)])
Sample Transaction Data
Description
A sample dataset of transaction events, linking customers and products. This is a typical "fact" table in a relational schema.
Usage
transactions
Format
A data.table with 5 variables:
- customer_id
Identifier for the customer making the transaction.
- product_id
Identifier for the product being purchased.
- time
The timestamp of the transaction (POSIXct format).
- quantity
The number of units of the product purchased.
- price
The price per unit at the time of transaction.
Source
Generated for package examples.
Sample Product View Data
Description
A sample dataset of product view events, linking customers and products. This is a smaller, sampled version of a potentially very large event log.
Usage
views
Format
A data.table with 3 variables:
- customer_id
Identifier for the customer viewing the product.
- product_id
Identifier for the product being viewed.
- time
The timestamp of the view event (POSIXct format).
Source
Generated for package examples.