---
title: "Asymmetric Smoothed-Association Matrices"
output:
  rmarkdown::html_vignette:
    toc: true
    fig_width: 7
    fig_height: 7
vignette: >
  %\VignetteIndexEntry{Asymmetric Smoothed-Association Matrices}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse   = TRUE,
  comment    = "#>",
  fig.alt    = paste(
    "Asymmetric smoothed-association matrix produced by",
    "janusplot(); diagonal cells hold variable labels,",
    "off-diagonal cells show the fitted mgcv::gam spline",
    "with a 95% confidence envelope, raw-data scatter,",
    "and per-cell annotations for n, EDF, and smooth",
    "significance glyph."
  ),
  fig.width  = 7,
  fig.height = 7,
  dpi        = 110,
  eval       = TRUE
)
library(ggplot2)
set.seed(2026L)
```

## Why Pearson is not enough

A Pearson correlation matrix gives one scalar per pair of variables.
Two numbers are discarded in that collapse:

1. The *shape* of the association — linear, monotone non-linear,
   U-shaped, or irregular.
2. The *direction* — whether $y$ is a smooth function of $x$
   differs in general from whether $x$ is a smooth function of $y$,
   because leverage and noise are directional.

`janusplot()` renders both recoveries visually for every pair in a
matrix layout, using proper `mgcv::gam()` fits (not loess) so EDF,
F-tests, and random effects are available.

## Quick start

```{r quickstart}
library(janusplot)

# Four numeric columns from mtcars (32 rows: small but illustrative)
janusplot(mtcars[, c("mpg", "hp", "wt", "qsec")])
```

Each off-diagonal cell shows:

- raw scatter (light grey),
- the fitted spline (blue line) and 95% CI ribbon,
- EDF (effective degrees of freedom) in the bottom-right,
- *n* used in the bottom-left,
- a signif-glyph in the top-right (`***` / `**` / `*` / `·`).

The cell fill is keyed to EDF: darker = more non-linear.

## Non-linear detection

A synthetic quadratic + sinusoidal example. The matrix makes it
obvious which variables are genuinely non-linearly related to which.

```{r nonlinear}
n <- 300
x1 <- runif(n, -3, 3)
x2 <- x1^2 + rnorm(n, sd = 0.6)   # quadratic on x1
x3 <- sin(x1) + rnorm(n, sd = 0.4) # sinusoidal on x1
x4 <- rnorm(n)                     # independent
d  <- data.frame(x1 = x1, x2 = x2, x3 = x3, x4 = x4)

janusplot(d)
```

EDF for `x2 ~ s(x1)` and `x3 ~ s(x1)` should clearly exceed 1; the
cell fills reflect that. Cells involving `x4` should be close to EDF =
1 (linear / flat).

## Asymmetry — a heteroscedastic example

When the noise scale depends on a predictor, the two directional smooths
diverge: $y \sim s(x)$ recovers the mean relationship; $x \sim s(y)$ is
distorted by the variance asymmetry.

```{r asym}
n <- 400
x <- runif(n, 0, 5)
y <- 0.5 * x + rnorm(n, sd = 0.3 + 0.4 * x)   # variance grows with x
d <- data.frame(x = x, y = y, z = rnorm(n))

janusplot(d)
```

The `A = ...` label per cell reports the asymmetry index
$A_{ij} = |EDF_{y|x} - EDF_{x|y}| / (EDF_{y|x} + EDF_{x|y}) \in [0, 1]$,
shown by default in the bottom-left corner alongside `EDF = ...`.

## Partial smooths (controlling for covariates)

Pass `adjust =` as a one-sided formula RHS to include fixed covariates
and/or random effects in every pairwise GAM.

```{r adjust, eval = requireNamespace("palmerpenguins", quietly = TRUE)}
library(palmerpenguins)
pp <- na.omit(penguins)

# Without covariate
janusplot(pp[, c("bill_length_mm", "bill_depth_mm",
                 "flipper_length_mm", "body_mass_g")])

# With species as a fixed effect — resolves Simpson's-paradox geometry
janusplot(pp, vars = c("bill_length_mm", "bill_depth_mm",
                       "flipper_length_mm", "body_mass_g"),
         adjust = ~ species)
```

## Changing the palette

The cell fill encodes the EDF (or deviance-explained) of the smooth and
is accompanied by a shared colourbar legend. Choose a palette with
`palette = `.

```{r palette-viridis, fig.height = 5, fig.width = 6}
d <- data.frame(
  x1 = runif(200, -3, 3),
  x2 = rnorm(200),
  x3 = rnorm(200)
)
d$x2 <- d$x1^2 + rnorm(200, sd = 0.8)  # non-linear

janusplot(d, palette = "viridis")  # default, colourblind-safe
```

```{r palette-brewer, fig.height = 5, fig.width = 6}
janusplot(d, palette = "RdYlBu")   # diverging, colourblind-safe
```

```{r palette-turbo, fig.height = 5, fig.width = 6}
janusplot(d, palette = "turbo")    # high-contrast, NOT colourblind-safe
```

**Colourblind-safe choices:**

- *Sequential:* `viridis` (default), `magma`, `inferno`, `plasma`,
  `cividis`, `mako`, `rocket`, `YlOrRd`, `YlGnBu`, `Blues`, `Greens`.
- *Diverging:* `RdYlBu`, `RdBu`, `PuOr`.

**High-contrast but not colourblind-safe:** `turbo`, `Spectral`.

## Handling missing data

```{r missing}
# airquality has genuine NAs in Ozone and Solar.R
janusplot(airquality[, c("Ozone", "Solar.R", "Wind", "Temp")],
          na_action = "pairwise")
```

`na_action = "pairwise"` uses all rows for which *that* pair is
complete; `"complete"` restricts to rows complete across every
variable (matching listwise deletion).

## Scaling up — `order = "hclust"`

For k large, reorder the axes by hierarchical clustering on
|correlation|:

```{r hclust, eval = requireNamespace("MASS", quietly = TRUE)}
data(Boston, package = "MASS")
janusplot(Boston[, c("medv", "lstat", "rm", "age",
                     "indus", "nox", "dis")],
          order = "hclust")
```

## Programmatic access — `janusplot_data()`

Returns raw GAM fits and per-cell metrics without constructing a
ggplot — useful for custom rendering or downstream analysis.

```{r data}
# Re-create the heteroscedastic example
n <- 400
het <- data.frame(
  x = runif(n, 0, 5),
  y = NA_real_
)
het$y <- 0.5 * het$x + rnorm(n, sd = 0.3 + 0.4 * het$x)

out <- janusplot_data(het, vars = c("x", "y"))
out$pairs[[1L]]$edf_yx
out$pairs[[1L]]$edf_xy
out$pairs[[1L]]$asymmetry_index
```

## Shape metrics explained

Every fitted smooth is summarised by two continuous indices and two
discrete counts. These drive the 24-category classifier and appear
as columns in `janusplot(..., with_data = TRUE)$data` and as fields
on each entry of `janusplot_data()$pairs`.

Let `f(x)` be the fitted smooth on a dense grid of 200 equally-spaced
points across the predictor range, with `f'` and `f''` the numerical
first and second derivatives. Let `w(x)` be the empirical density of
the predictor on the same grid, normalised to `sum(w) = 1`.

- **`monotonicity_index`** (paper symbol `M`):

    `M = sum(w * f') / sum(w * |f'|)  in  [-1, 1]`

    `+1` means strictly increasing, `-1` strictly decreasing, `0` a
    non-monotone curve (bowl, dome, wave).

- **`convexity_index`** (paper symbol `C`):

    `C = sum(w * f'') / sum(w * |f''|)  in  [-1, 1]`

    `+1` means globally convex (bowl-up), `-1` globally concave
    (bowl-down), `0` inflection-dominated (S-curve, sine, flat).

Both indices are density-weighted so they describe the smooth
*where the data actually live*, not extrapolated tails, and are
scale-invariant: replacing `y` with `a * y + b` leaves them unchanged.

- **`n_turning_points`** — count of interior extrema (sign changes
  of `f'`), robust to noise via lobe-mass weighting.
- **`n_inflections`** — count of interior curvature flips (sign
  changes of `f''`), same robust counting.

Together the pair `(n_turning_points, n_inflections)` drives the
primary `shape_category` dispatch; `(monotonicity_index,
convexity_index)` disambiguate within the monotone `(0, 0)` and
single-extremum `(1, 0)` cells. The full taxonomy with 2-letter
codes, archetypes, and thumbnail curves is available from
`janusplot_shape_hierarchy()` and is rendered as the standing legend
below every `janusplot()` call.

Tune the thresholds applied to these indices via
`janusplot_shape_cutoffs()`. See the
[shape-recognition-sensitivity](shape-recognition-sensitivity.html)
vignette for how faithfully the classifier recovers ground-truth
shapes across sample-size and noise regimes.

## Derivative views: theoretical justification and applied use

Each matrix renders one quantity. `display = "fit"` (default) shows
the fitted smooth; `display = "d1"` shows \eqn{\hat f'(x)};
`display = "d2"` shows \eqn{\hat f''(x)}. A top-of-matrix title
names the mode, so side-by-side calls compare unambiguously.
Orders beyond two are not exposed — see *Noise amplification*
below. Derivative CI rendering is **off by default**; opt in with
`derivative_ci = "pointwise"` or `"simultaneous"`.

```{r derivs-fit, fig.height = 4.5}
set.seed(2026L)
n  <- 300L
xs <- runif(n, -pi, pi)
df <- data.frame(
  x  = xs,
  y1 = xs + sin(3 * xs) + rnorm(n, sd = 0.15),
  y2 = 0.5 * xs^2       + rnorm(n, sd = 0.8)
)
janusplot(df, display = "fit", show_shape_legend = FALSE)
```

```{r derivs-d1, fig.height = 4.5}
janusplot(df, display = "d1", show_shape_legend = FALSE)
```

```{r derivs-d2, fig.height = 4.5}
janusplot(df, display = "d2", show_shape_legend = FALSE)
```

Turn on simultaneous bands — a single call gets the Monte Carlo
critical multiplier per Simpson (2018):

```{r derivs-sim, fig.height = 4.5}
janusplot(df, display = "d1",
          derivative_ci = "simultaneous",
          derivative_ci_nsim = 2000L,
          show_shape_legend = FALSE)
```

### What derivatives reveal that the fit hides

The fitted smooth $\hat f(x) = \mathbb{E}[y\mid x]$ is a level
description. Its derivatives are different statistical objects with
their own interpretations:

- $\hat f'(x)$ — the **local rate of change** of $y$ in $x$. Zero
  crossings localise the turning points of $\hat f$; the sign of
  $\hat f'$ gives the direction of monotonicity; the magnitude gives
  the sensitivity at the operating point $x$. In control engineering
  this is literally the process gain $K(x) = \partial y / \partial u$
  that gain-scheduled controllers are built around (Rugh & Shamma,
  2000; Leith & Leithead, 2000). In causal analysis of a continuous
  treatment it is the derivative of the dose--response curve
  $\mu'(t) = \partial \mathbb{E}[Y(t)] / \partial t$, which Zhang &
  Chen (2025) argue is often *the* treatment-effect object of
  interest, not the curve itself.
- $\hat f''(x)$ — the **local curvature**. Zero crossings localise
  the inflection points of $\hat f$; a persistently positive
  second derivative flags accelerating growth, persistently negative
  flags saturation (diminishing returns). $\hat f''$ is the input to
  the convexity index $C$ defined earlier in this vignette, so the
  derivative panel exposes the *local* signal behind that scalar
  summary.

The asymmetric matrix layout sharpens this. `janusplot()` fits both
$\hat f_{y\mid x}(x)$ and $\hat f_{x\mid y}(y)$, so derivative panels
on the two triangles answer genuinely different questions: the upper
triangle is "how steeply does $y$ respond to a nudge in $x$ at this
operating point" (forward gain); the lower triangle is "how steeply
must $x$ change to induce a unit change in $y$" (inverse
sensitivity). For an asymmetric process these do not transpose into
each other, and the directional asymmetry is a diagnostic the
symmetric correlation matrix cannot expose (Janzing & Schölkopf,
2010).

### Estimation — the LP matrix

Let $X_p = X_p(\mathbf{x}_g)$ denote the design (linear predictor)
matrix of the fitted GAM evaluated on the plotting grid
$\mathbf{x}_g$, obtained from
`predict(gam_fit, newdata = ..., type = "lpmatrix")` (Wood, 2017,
§7.2.4). With penalised posterior mean $\hat{\boldsymbol\beta} =
\mathtt{coef(gam\_fit)}$ and posterior covariance $V_p =
\mathtt{gam\_fit\$Vp}$, we construct a finite-difference operator
$D^{(k)}$ on the *rows* of $X_p$ (central differences in the interior,
second-order forward / backward stencils at the endpoints) and read
off

$$
\hat f^{(k)}(x_i) = \bigl[D^{(k)} \hat{\boldsymbol\beta}\bigr]_i,
\qquad
\widehat{\mathrm{Var}}\bigl(\hat f^{(k)}(x_i)\bigr) = \bigl[D^{(k)} V_p
\bigl(D^{(k)}\bigr)^{\!\top}\bigr]_{ii}.
$$

Pointwise $95\%$ intervals are $\hat f^{(k)}(x) \pm 1.96\,\sqrt{\cdot}$.
This is the standard Wood (2017) construction, and is what
`gratia::derivatives()` implements in its default mode (Simpson,
2014; Simpson, 2018). Columns of $X_p$ corresponding to `adjust`
terms held at typical values contribute identical rows across the
grid, so their finite differences are zero and they drop out of both
$\hat f^{(k)}$ and its variance — the derivative in the panel is
therefore the derivative of the *partial smooth actually shown in
the fit panel*, as expected.

For simultaneous intervals over the full grid (a stricter question
than pointwise, and what you want for formal feature localisation),
`janusplot()` implements the Monte Carlo construction of Ruppert,
Wand & Carroll (2003, §6.5), popularised for GAMs by Simpson (2018):
draw $\tilde{\boldsymbol\beta}_b \sim
\mathcal{N}(\hat{\boldsymbol\beta}, V_p)$ for $b = 1, \ldots, B$ and
take the $(1-\alpha)$ quantile of $\max_i |D^{(k)}_i
(\tilde{\boldsymbol\beta}_b - \hat{\boldsymbol\beta})| /
\mathtt{se}_i$ across the plotting grid as a critical multiplier
$c_\alpha$ on the pointwise SE, so the simultaneous band is
$\hat f^{(k)}(x) \pm c_\alpha\,\mathtt{se}(x)$. Opt in via
`derivative_ci = "simultaneous"` on either `janusplot()` or
`janusplot_data()`; the default is `derivative_ci = "none"` so that
no CI is drawn by default — derivative ribbons invite over-reading
of local features and should be a deliberate choice, not a
default. The implementation uses $B = 1000$ (see
`derivative_ci_nsim`); Simpson (2018) uses $10\,000$, which is
affordable if you need tighter quantile estimation.

### Noise amplification and why we cap at $k = 2$

Finite differencing of raw data amplifies noise; penalised splines
do not eliminate that amplification, they trade it against bias via
the REML-selected smoothing parameter. `mgcv`'s default thin-plate
penalty is on $\int (f'')^2$, which directly regularises $\hat f'$
and bounds (but does not penalise) $\hat f''$ only via the basis
rank (Wood, 2017, §5.3; Eilers & Marx, 1996). In practice we find
$\hat f^{(3)}$ is dominated by noise for $n < 10^4$ at moderate $k$,
and so janusplot refuses $k \ge 3$ by design. If you have a
domain-specific reason to need a higher-order derivative, specify a
matching-order P-spline penalty explicitly (Eilers, Marx & Durbán,
2015) and extract it yourself from `janusplot_data()`.

### Applied use: gain estimation and dose--response

Two strands in which the asymmetric derivative view is not a
cosmetic add-on but the analytical primitive the practitioner
actually wants.

- **Process-gain scheduling.** In adaptive and gain-scheduled
  control, the controller is indexed by the local process gain
  $K(x) = \partial y / \partial u$ (Rugh & Shamma, 2000). For a
  steady-state input-output dataset, $\hat f_{y\mid u}'(u)$ is a
  direct data-driven estimate of $K(u)$, and its simultaneous CI
  tells the engineer whether the local gain is distinguishable from
  a reference gain over an operating envelope. The inverse panel
  $\hat f_{u\mid y}'(y)$ is the feedforward-linearisation
  sensitivity; a large divergence between the two panels flags that
  a naive inverse controller will under-perform (Korda & Mezić,
  2018). The matrix view makes a fleet of such pairs inspectable at
  once.
- **Derivative of the dose--response curve as the causal estimand.**
  For a continuous treatment $T$ with unconfoundedness, the
  dose--response $\mu(t) = \mathbb{E}[Y(t)]$ and its derivative
  $\mu'(t)$ are both estimable, and recent work (Zhang & Chen, 2025)
  argues $\mu'(t)$ is often the more directly interpretable
  quantity — it answers "how much does the expected outcome change
  per unit shift in treatment at this dose?" This is structurally
  the same estimand as the process gain above; the
  asymmetric-matrix derivative panel delivers both forward and
  reverse-conditioned derivative curves in the same frame, which is
  a direct diagnostic for Simpson's-paradox-style conditioning
  reversals (visible, for example, in the penguins
  `bill_depth_mm` $\times$ `body_mass_g` pair once `species` is
  adjusted for).

### References cited in this section

Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with
B-splines and penalties. *Statistical Science*, **11**(2), 89--121.
<https://doi.org/10.1214/ss/1038425655>

Eilers, P. H. C., Marx, B. D., & Durbán, M. (2015). Twenty years of
P-splines. *SORT*, **39**(2), 149--186.

Janzing, D., & Schölkopf, B. (2010). Causal inference using the
algorithmic Markov condition. *IEEE Transactions on Information
Theory*, **56**(10), 5168--5194.
<https://doi.org/10.1109/TIT.2010.2060095>

Korda, M., & Mezić, I. (2018). Linear predictors for nonlinear
dynamical systems: Koopman operator meets model predictive control.
*Automatica*, **93**, 149--160.
<https://doi.org/10.1016/j.automatica.2018.03.046>

Leith, D. J., & Leithead, W. E. (2000). Survey of gain-scheduling
analysis and design. *International Journal of Control*, **73**(11),
1001--1025. <https://doi.org/10.1080/002071700411304>

Rugh, W. J., & Shamma, J. S. (2000). Research on gain scheduling.
*Automatica*, **36**(10), 1401--1425.
<https://doi.org/10.1016/S0005-1098(00)00058-3>

Ruppert, D., Wand, M. P., & Carroll, R. J. (2003). *Semiparametric
Regression*. Cambridge University Press.

Simpson, G. L. (2014). Simultaneous confidence intervals for
derivatives of smooth terms in a GAM. *From the Bottom of the Heap*
(blog post).

Simpson, G. L. (2018). Modelling palaeoecological time series using
generalised additive models. *Frontiers in Ecology and Evolution*,
**6**, 149. <https://doi.org/10.3389/fevo.2018.00149>

Wood, S. N. (2017). *Generalized Additive Models: An Introduction
with R* (2nd ed.). Chapman and Hall/CRC.
<https://doi.org/10.1201/9781315370279>

Zhang, Y., & Chen, Y.-C. (2025). Doubly robust inference on causal
derivative effects for continuous treatments. arXiv preprint
[2501.06969].

## Limitations

- Pairwise view, not conditional — always complement with a proper
  multivariate model.
- EDF depends on basis dimension `k`; defaults are sensible but
  domain-specific tuning is encouraged.
- The asymmetry index should not be interpreted causally without
  strong assumptions.
- `monotonicity_index` and `convexity_index` are scale-invariant in
  `y` but sensitive to the predictor-density weighting — they
  describe the smooth on the observed support of `x`, not outside
  it.
- `display` is scalar — a single `janusplot()` call renders a
  single quantity (fit, d1, or d2). To compare fit against
  derivative, issue two or three calls; each carries its own
  matrix-level title and, when `with_data = TRUE`, its own
  `display`-tagged summary table.
- Derivative panels show **no confidence ribbon by default**
  (`derivative_ci = "none"`). Opt in explicitly: `"pointwise"` for
  marginally-valid pointwise 95% bands, `"simultaneous"` for
  Simpson (2018) Monte Carlo bands valid for feature localisation.
- Requesting `display %in% c("d1", "d2")` raises the default
  prediction-grid resolution from 100 to 200 points, which slightly
  shifts the numeric shape-metric values (`M`, `C`, turning and
  inflection counts) reported alongside the fit. Shapes and
  asymmetry — the primary reading of the matrix — are robust to
  this drift; `M`, `C` and the counts are secondary diagnostics.
  The precomputed `shape_sensitivity_demo` dataset was generated
  under `n_grid = 100` and is preserved as-is for reproducibility.

## Citation

```{r cite}
citation("janusplot")
```