| Type: | Package |
| Title: | Visualizing Decomposition of Differences in Rate Metrics |
| Version: | 0.3.0 |
| Description: | Provides tools for decomposing differences in rate metrics between two groups into contributions from individual subgroups and visualizing them as a "Theseus Plot". Inspired by the story of the Ship of Theseus, the method replaces subgroup data from one group with that of another step by step, recalculating the overall metric at each stage to quantify subgroup contributions. A Theseus Plot combines the stepwise progression of a waterfall plot with the comparative bars of a bar chart, offering an intuitive way to understand subgroup-level effects. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/hoxo-m/TheseusPlot, https://hoxo-m.github.io/TheseusPlot/ |
| BugReports: | https://github.com/hoxo-m/TheseusPlot/issues |
| Depends: | R (≥ 4.1.0) |
| Imports: | dplyr, forcats, ggplot2, memoise, R6, rlang, stats, stringr, tibble, tidyr, waterfalls (≥ 1.1.4) |
| Suggests: | knitr, nycflights13, rmarkdown, testthat (≥ 3.0.0) |
| Encoding: | UTF-8 |
| Config/testthat/edition: | 3 |
| Config/roxygen2/version: | 8.0.0 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-14 22:05:48 UTC; akagi |
| Author: | Koji Makiyama [aut, cre, cph], Kazuyuki Sano [ctb, wdc], Shinichi Takayanagi [med], Daisuke Ichikawa [exp], LY Corporation Analytics Solution Enhancement Team [spn] |
| Maintainer: | Koji Makiyama <hoxo.smile@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-14 23:40:13 UTC |
An R6 Class for Generating Theseus Plot
Description
The ShipOfTheseus class decomposes the difference in outcome rates
between two datasets. For a selected column, it computes subgroup
contributions, summarizes the results in tables, and visualizes them as
waterfall-style Theseus Plots.
Methods
Public methods
ShipOfTheseus$new()
The constructor of the ShipOfTheseus class.
Usage
ShipOfTheseus$new(data1, data2, outcome, labels, xlab, ylab, digits, text_size)
Arguments
data1data frame representing the first group (e.g., the baseline data).
data2data frame representing the second group (e.g., the comparison data).
outcomestring specifying the outcome variable used to compute the rate metric (default is
"y"). Typically, this is a binary indicator (e.g., 0/1) that is aggregated to form rates.labelscharacter vector of length 2 giving the labels for the two groups. The first corresponds to
data1, the second todata2. Default isc("Baseline", "Comparison").xlabstring specifying the x-axis label for plots. If
NULL(default), no label is displayed.ylabstring specifying the y-axis label for plots. If
NULL(default), no label is displayed.digitsinteger indicating the number of decimal places to use for displaying numeric values (default is
1).text_sizenumeric value specifying the relative size of text elements in plots (default is
1.0).
Returns
A ShipOfTheseus object, which can be used with
plot() to create Theseus plots.
ShipOfTheseus$table()
Generate a contribution table for a given column.
Usage
ShipOfTheseus$table(column_name, n = Inf, continuous = continuous_config())
Arguments
column_namestring. The name of the column to analyze.
ninteger. Maximum number of top contributing subgroups to display. If the number of subgroups exceeds 'n', the remaining are aggregated.
continuouslist. A configuration list for handling continuous variables (e.g., specifying number of bins or custom breaks).
Returns
A tibble summarizing subgroup contributions to the difference between the two groups, including counts, total outcomes, and rates for each subgroup.
ShipOfTheseus$plot()
Generate a Theseus plot for a specified column
Usage
ShipOfTheseus$plot( column_name, n = 10L, main_item = NULL, bar_max_value = NULL, levels = NULL, continuous = continuous_config() )
Arguments
column_nameThe name of the column to visualize.
ninteger. Maximum number of top contributing subgroups to display. Remaining subgroups are aggregated if necessary.
main_itemstring. The subgroup used as the reference for scaling the bar heights.
bar_max_valuenumeric. Maximum value for scaling the contribution bars.
levelscharacter vector specifying the display order of subgroups.
continuouslist. Configuration for handling continuous variables (e.g., number of bins or custom breaks).
Returns
A ggplot object representing the Theseus Plot for the specified column.
ShipOfTheseus$plot_flip()
Generate a Theseus plot for a specified column
Usage
ShipOfTheseus$plot_flip( column_name, n = 10L, main_item = NULL, bar_max_value = NULL, levels = NULL, continuous = continuous_config() )
Arguments
column_nameThe name of the column to visualize.
ninteger. Maximum number of top contributing subgroups to display. Remaining subgroups are aggregated if necessary.
main_itemstring. The subgroup used as the reference for scaling the bar heights.
bar_max_valuenumeric. Maximum value for scaling the contribution bars.
levelscharacter vector specifying the display order of subgroups
continuouslist. Configuration for handling continuous variables (e.g., number of bins or custom breaks).
Returns
A ggplot object representing the Theseus Plot for the specified column.
ShipOfTheseus$clone()
The objects of this class are cloneable with this method.
Usage
ShipOfTheseus$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Continuous Variable Configuration for Theseus Plot
Description
The continuous_config() function creates a configuration object for
handling continuous variables in Theseus plots. It controls how continuous
data is binned into discrete categories for contribution calculations and
visualization.
Usage
continuous_config(
n = 10L,
pretty = TRUE,
split = c("count", "width", "rate"),
breaks = NULL
)
Arguments
n |
integer. Number of bins to create for a continuous variable. |
pretty |
logical. If TRUE, use pretty breaks for bin edges. |
split |
string. Method for binning continuous variables. Options are:
|
breaks |
numeric vector specifying custom break points. |
Value
A list containing binning parameters (n, pretty,
split, breaks) to be used in plotting or contribution
calculations for continuous variables.
Examples
library(TheseusPlot)
continuous_config(n = 5, pretty = FALSE, split = "rate")
Creates a Ship Object for Generating Theseus Plots
Description
Creates a ship object, which serves as a container for data and methods to generate Theseus plots for decomposing differences in rate metrics.
Usage
create_ship(
data1,
data2,
y = "y",
labels = c("Baseline", "Comparison"),
xlab = NULL,
ylab = NULL,
digits = 1L,
text_size = 1
)
Arguments
data1 |
data frame representing the first group (e.g., the baseline data). |
data2 |
data frame representing the second group (e.g., the comparison data). |
y |
column name specifying the outcome variable used to compute the rate
metric (default is |
labels |
character vector of length 2 giving the labels for the two
groups. The first corresponds to |
xlab |
string specifying the x-axis label for plots. If |
ylab |
string specifying the y-axis label for plots. If |
digits |
integer indicating the number of decimal places to use for
displaying numeric values (default is |
text_size |
numeric value specifying the relative size of text elements
in plots (default is |
Value
A ShipOfTheseus object, which can be used with plot()
to create Theseus plots.
Examples
library(dplyr)
library(TheseusPlot)
data <- nycflights13::flights |>
filter(!is.na(arr_delay)) |>
mutate(on_time = arr_delay <= 0)
data1 <- data |> filter(month == 1)
data2 <- data |> filter(month == 2)
create_ship(data1, data2, y = on_time)