This vignette demonstrates how to use causaldef for safe policy learning — making treatment decisions with quantified guarantees even when unobserved confounding exists.
The key insight is the policy regret transfer bound:
\[\text{Regret}_{do}(\pi) \leq \text{Regret}_{obs}(\pi) + M \cdot \delta\]
where: - \(\text{Regret}_{do}(\pi)\) = regret under the true interventional distribution - \(\text{Regret}_{obs}(\pi)\) = regret observed in data - \(M\) = utility range (max - min possible outcomes) - \(\delta\) = Le Cam deficiency (quantifies confounding)
policy_regret_bound() reports two complementary quantities:
If \(\delta>0\), no algorithm can guarantee zero worst-case regret without stronger assumptions or randomized data.
library(causaldef)
set.seed(123)
# Simulate a treatment decision problem
n <- 1000
# Covariates
age <- runif(n, 30, 70)
severity <- rbeta(n, 2, 5) * 10
# Confounded treatment assignment (sicker patients get treatment)
U <- rnorm(n) # Unmeasured health status
ps_true <- plogis(-1 + 0.02 * age + 0.1 * severity + 0.5 * U)
A <- rbinom(n, 1, ps_true)
# Outcome: recovery score (0-100)
# True effect is heterogeneous
tau_true <- 10 + 0.2 * (age - 50) # Older patients benefit more
Y <- 50 + tau_true * A - 0.3 * severity + 5 * U + rnorm(n, sd = 5)
# Clip to valid range
Y <- pmin(100, pmax(0, Y))
df <- data.frame(
age = age,
severity = severity,
A = A,
Y = Y
)spec <- causal_spec(
data = df,
treatment = "A",
outcome = "Y",
covariates = c("age", "severity")
)
#> ✔ Created causal specification: n=1000, 2 covariate(s)
# Estimate deficiency with multiple methods
def_results <- estimate_deficiency(
spec,
methods = c("unadjusted", "iptw", "aipw"),
n_boot = 100
)
#> ℹ Estimating deficiency: unadjusted
#> ℹ Estimating deficiency: iptw
#> ℹ Estimating deficiency: aipw
print(def_results)
#>
#> -- Deficiency Proxy Estimates (PS-TV) ------
#>
#> Method Delta SE CI Quality
#> unadjusted 0.054 0.0139 [0.0454, 0.0965] Caution (Yellow)
#> iptw 0.011 0.0042 [0.0075, 0.023] Excellent (Green)
#> aipw 0.011 0.0042 [0.0073, 0.0223] Excellent (Green)
#> Note: delta is a propensity-score TV proxy (overlap/balance diagnostic).
#>
#> Best method: iptw (delta = 0.011 )# Define utility range (outcome is 0-100)
utility_range <- c(0, 100)
# Suppose our policy achieves 5% observed regret
obs_regret <- 5
# Compute bound
bounds <- policy_regret_bound(
deficiency = def_results,
utility_range = utility_range,
obs_regret = obs_regret
)
#> Warning: Multiple fitted methods are available but `method` was not specified.
#> ℹ Using the smallest available delta across methods is optimistic after model
#> selection.
#> ℹ For a pre-specified decision bound, call `policy_regret_bound()` with `method
#> = '<chosen method>'`.
#> ℹ Transfer penalty: 1.0973 (delta = 0.011)
print(bounds)
#>
#> -- Policy Regret Bounds -------------------------------------------------
#>
#> * Deficiency delta: 0.011
#> * Delta mode: point
#> * Delta method: iptw
#> * Delta selection: minimum across fitted methods
#> * Utility range: [0, 100]
#> * Transfer penalty: 1.0973 (additive regret upper bound)
#> * Minimax floor: 0.5486 (worst-case lower bound)
#>
#> * Observed regret: 5
#> * Interventional bound: 6.0973
#>
#> Note: this is a plug-in bound using a deficiency proxy rather than an identified exact deficiency.
#> Note: minimum-across-methods selection is optimistic after model selection.
#>
#> Interpretation: Transfer penalty is 1.1 % of utility range given deltacat("=== Policy Deployment Decision ===\n\n")
#> === Policy Deployment Decision ===
delta_best <- min(def_results$estimates)
M <- diff(utility_range)
transfer_penalty <- M * delta_best
minimax_floor <- 0.5 * M * delta_best
cat(sprintf("Best achievable deficiency: %.3f\n", delta_best))
#> Best achievable deficiency: 0.011
cat(sprintf("Transfer penalty (M*delta): %.1f points\n", transfer_penalty))
#> Transfer penalty (M*delta): 1.1 points
cat(sprintf("Minimax safety floor (M/2*delta): %.1f points\n", minimax_floor))
#> Minimax safety floor (M/2*delta): 0.5 points
cat(sprintf("Observed regret: %.1f points\n", obs_regret))
#> Observed regret: 5.0 points
if (!is.null(bounds$regret_bound)) {
cat(sprintf("Worst-case regret: %.1f points\n", bounds$regret_bound))
}
#> Worst-case regret: 6.1 points
cat("\n")
# Decision thresholds
if (delta_best < 0.05) {
cat("✓ EXCELLENT: Deficiency < 5%. High confidence in policy.\n")
} else if (delta_best < 0.10) {
cat("⚠ MODERATE: Deficiency 5-10%. Proceed with monitoring.\n")
} else {
cat("✗ CAUTION: Deficiency > 10%. Consider RCT before deployment.\n")
}
#> ✓ EXCELLENT: Deficiency < 5%. High confidence in policy.What if there’s additional unmeasured confounding?
# Map the confounding frontier
frontier <- confounding_frontier(
spec,
alpha_range = c(-2, 2),
gamma_range = c(-2, 2),
grid_size = 30
)
#> ℹ Computing benchmarks for observed covariates...
#> ✔ Computed confounding frontier: 30x30 grid
# Find the safe region
safe_region <- subset(frontier$grid, delta < 0.1)
cat(sprintf(
"Safe operating region covers %.1f%% of confounding space\n",
100 * nrow(safe_region) / nrow(frontier$grid)
))
#> Safe operating region covers 100.0% of confounding spaceIf you have the grf package installed, you can use causal forests for heterogeneous treatment effect estimation with deficiency bounds:
# Estimate deficiency using causal forests
if (requireNamespace("grf", quietly = TRUE)) {
def_grf <- estimate_deficiency(
spec,
methods = c("aipw", "grf"),
n_boot = 50
)
print(def_grf)
# Get individual treatment effect predictions
kernel_grf <- def_grf$kernel$grf
if (!is.null(kernel_grf$tau_hat)) {
cat("\nHeterogeneous Effects Detected:\n")
cat(sprintf("ATE from forest: %.2f\n", kernel_grf$ate))
cat(sprintf("CATE range: [%.2f, %.2f]\n",
min(kernel_grf$tau_hat),
max(kernel_grf$tau_hat)))
}
}| Check | Threshold | Action if Failed |
|---|---|---|
| \(\delta < 0.05\) | Excellent | Deploy with confidence |
| \(\delta \in [0.05, 0.10]\) | Moderate | Deploy with active monitoring |
| \(\delta > 0.10\) | Concerning | Consider pilot RCT |
| NC diagnostic falsified | Any | Do not deploy without more data |
# Example: Re-estimate deficiency on new data
new_data <- ... # Your production data
new_spec <- causal_spec(
new_data,
treatment = "A",
outcome = "Y",
covariates = c("age", "severity")
)
# Quick check
def_monitor <- estimate_deficiency(
new_spec,
methods = "iptw",
n_boot = 50
)
# Alert if deficiency increased
if (def_monitor$estimates["iptw"] > 1.5 * delta_best) {
warning("Distribution shift detected! Deficiency increased.")
}For any policy \(\pi\) and bounded utility function \(u \in [0, M]\):
\[\mathbb{E}_{P^{do}}\left[\max_a u(a, X) - u(\pi(X), X)\right] \leq \mathbb{E}_{P^{obs}}\left[\max_a u(a, X) - u(\pi(X), X)\right] + M\delta\]
Proof sketch: The deficiency \(\delta\) bounds the total variation distance between the (simulated) observational and target interventional laws. Since utility is bounded by \(M\), the maximum discrepancy in expected utility is at most \(M\) times the total variation gap.
Traditional ML focuses on: - Prediction error: How well does my model predict \(Y\)? - Generalization: Does performance hold on new data?
But for causal policy learning, we need: - Interventional validity: Does my policy work when deployed? - Confounding robustness: How much could unmeasured bias hurt me?
The safety floor answers these questions with formal guarantees.
| Concept | Definition | Function |
|---|---|---|
| Transfer penalty | \(M\delta\) — additive regret inflation term | $transfer_penalty |
| Minimax safety floor | \((M/2)\delta\) — irreducible worst-case regret | $minimax_floor |
| Regret bound | observed regret + transfer penalty | $regret_bound |
| Deficiency | Information gap between obs and do | estimate_deficiency() |
| Confounding Frontier | Deficiency as function of \((\alpha, \gamma)\) | confounding_frontier() |
Use these tools to make safe, accountable decisions from observational data.
Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison. DOI: 10.5281/zenodo.18367347. See thm:policy_regret (Policy Regret Transfer) and thm:safety_floor (Minimax Safety Floor).
Athey, S., & Wager, S. (2021). Policy learning with observational data. Econometrica, 89(1), 133-161.
Kallus, N. (2020). Confounding-robust policy evaluation in infinite-horizon reinforcement learning. NeurIPS.