| Type: | Package |
| Title: | Multi-Regime Marginal Structural Cox Model for Multi-Way Treatment Switching in Oncology Clinical Trials with Survival Endpoints |
| Version: | 0.1.2 |
| Description: | Estimate the causal effect of sustained treatment strategies on overall survival in clinical trials with possible treatment crossover and switch to subsequent therapy. Simulate faithful longitudinal clinical trials data with survival endpoints and multi-way treatment switches allowing for time-dependent prognostic factors. For more on methodological background, please see: Keogh and colleagues (2021) <doi:10.1002/bimj.202000040> and Suarez and colleagues (2008) <doi:10.1016/j.jclinepi.2007.11.007>. |
| License: | MIT + file LICENSE |
| Imports: | nnet, survival |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.2.3 |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown |
| URL: | https://github.com/tonyhbc/multiswc |
| BugReports: | https://github.com/tonyhbc/multiswc/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-05-25 03:13:12 UTC; tonyhchen |
| Author: | Haobin Chen [aut, cre], Yuxuan Chen [aut], Philip He [aut] |
| Maintainer: | Haobin Chen <tony.haobin.chen@alumni.emory.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-29 10:00:02 UTC |
Fit a Multi-regime Marginal Structural Cox Model for Three-way Treatment Switching
Description
multimsm() fits a marginal structural Cox model with categorical treatment
for randomized clinical trials or trial-like longitudinal data with multi-way
treatment switching, assuming participants underwent regular longitudinal
follow-up visits. This method assumes a three-way treatment switch scenario:
control participants may crossover to experimental group and both control
and experimental participants may switch to a subsequent therapy.
multimsm() takes a longitudinal survival dataset with 3 treatment process
variables as primary inputs:
-
rand: baseline randomized treatment arm.1denotes randomized experimental treatment and0denotes randomized control/SOC. -
cross: time-varying control-to-experimental crossover indicator (1/0). -
subseq: time-varying subsequent therapy initiation indicator (1/0).
Internally, multimsm() converts this triplet of treatment trajectory into
the five-level time-varying treatment regime:
G(t) \in \{C, E, CE, CS, ES\}
where C and E denote sustained control and sustained experimental
treatment regime at t, CE denotes control-to-experimental crossover, CS
denotes on control-to-subsequent switch regime status at t, and ES denotes
on experimental-to-subsequent switch regime status at t.
The downstream estimator is a multinomial-regime stabilized-weight marginal structural Cox model with the time-varying regime as predictor:
\lambda_{T^{\overline g}}(t)=\lambda_0(t)\exp\{\beta_e\mathbb{I}({G(t)=E}) +
\beta_{ce} \mathbb{I}({G(t)=CE}) + \beta_{cs} \mathbb{I}({G(t)=CS}) +
\beta_{es} \mathbb{I}({G(t)=ES})\}
The stabilized treatment-regime weight (S-IPTW) at each post-baseline interval is the probability of the observed current regime under a stabilizer model divided by the corresponding probability under a denominator model. The cumulative product of these row-specific ratios is used as the inverse probability weight. Optional ordinary censoring weights may also be multiplied in, if informative censoring is present.
Usage
multimsm(dat_long, id = "id", tstart = "t.start", tstop = "t.stop",
event = "event", cens = NULL, rand = "rand", cross = "cross",
subseq = "subseq", base_cov = NULL, iptw_num = NULL, iptw_den,
ipcw_mod = NULL, wt_trunc = NULL, prob_bounds = c(1e-06, 1 - 1e-06),
normalize_weights = TRUE, robust = TRUE, check_inputs = TRUE,
trace_multinom = FALSE, maxit = 200)
Arguments
dat_long |
A |
id |
Character. Subject identifier variable name. |
tstart |
Character. Interval start-time variable name. |
tstop |
Character. Interval stop-time variable name. |
event |
Character. Event indicator variable name ( |
cens |
Optional character Ordinary right-censoring indicator used
only if |
rand |
Character. Baseline randomization variable name ( |
cross |
Character. Time-varying control-to-experimental crossover indicator. It may be one-time pulse at crossover time (...,0,1,0,0,...) or absorbing (...,0,1,1,1,...). It must be structurally 0 always for subjects randomized to experimental treatment (assumed no EC switch). |
subseq |
Character. Time-varying subsequent therapy initiation indicator. It may be pulse or absorbing. |
base_cov |
Optional character vector of baseline covariates to include in
the default numerator model when |
iptw_num |
Optional right-hand-side formula or character string for
the stabilized numerator multinomial regime model. If |
iptw_den |
Required right-hand-side formula or character string for
the denominator multinomial regime model. This should typically include
|
ipcw_mod |
Optional right-hand-side formula or character string for an
ordinary censoring model. If supplied, |
wt_trunc |
Optional numeric in |
prob_bounds |
Numeric length-2 vector of lower and upper probability bounds used to stabilize extreme predicted probabilities away from 0 and 1. |
normalize_weights |
Logical. If |
robust |
Logical. If |
check_inputs |
Logical. If |
trace_multinom |
Logical. Passed to |
maxit |
Integer maximum number of iterations for the multinomial models. |
Details
The treatment-switching process is assumed to follow the package's five-regime structure G(t):
G_i(t)=
\begin{cases}
C, & R_i=0,\ Crs_i(t)=0,\ Subs_i(t)=0,\\
E, & R_i=1,\ Crs_i(t)=0,\ Subs_i(t)=0,\\
CE, & R_i=0,\ Crs_i(t)=1,\ Subs_i(t)=0,\\
CS, & R_i=0,\ Crs_i(t)=0,\ Subs_i(t)=1,\\
ES, & R_i=1,\ Crs_i(t)=0,\ Subs_i(t)=1.
\end{cases}
R(t) is rand randomized arm, Crs(t) is cross the absorbing crossover
status, and Subs(t) is subseq the absorbing subsequent initiation status.
Users may supply cross and subseq with either sustained absorbing
status indicator such as (0, 0, 1, 1, 1), or as switch-initiation action
indicators such as (0, 0, 1, 0, 0) - both acceptable.
The current method allows only one switch type per subject's follow up. Thus,
control-arm subjects may remain in C, transition to CE, or transition to CS;
experimental-arm subjects may remain in E or transition to ES.
The outcome model uses generated regime as the exposure. By default
C is the reference regime, so the coefficient for regimeE estimates the
marginal sustained experimental-versus-control contrast aimed by the package's
primary no-switch estimand. The remaining coefficients describe the distinct
switched-regime contrasts averaging switch times relative to sustained control.
Value
An object of class "multimsm"; a list with components:
-
coef_tableAdata.frameof full estimated coefficient table. -
coefLog-hazard ratio estimates for treatment regimes against control. -
hrHazard ratio estimates for treatment regimes against control. -
hr_ciWald 95% confidence intervals for the hazard ratios. -
fitThe fitted weighted Cox model fromsurvival::coxph(). -
p.valP values of regime effect estimates. -
dat_longAugmented long-format data containing the derived variables, including switch indicators, and final estimated weights (.w_use/.w_use_trunc). -
regime_modelsA list containing the fitted numerator and denominator mulinomial regime models. -
censor_modelThe fitted censoring model whenipcw_modis supplied; othrwiseNULL. -
diagnosticsA list containing untruncated final-weight quantiles, final switch-regime counts/proportions, cohort size, and truncation bounds. -
callThe matched function call.
References
Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550-560.
Suarez D, Haro JM, Novick D, et al. Marginal structural models for multiple treatment comparisons: an application to antipsychotic treatment for schizophrenia. Journal of Clinical Epidemiology. 2008;61(6):525-530.
Examples
set.seed(123)
sim_obj <- simswc(
n = 300,
n_visit = 8,
base_cov = c("L1", "L3"),
trt_prob = 0.5,
param_tdcov = c(-1.5, 0.3, 0.3, -0.2, 0.5),
param_tdcov2 = c(0.05, 0.2, 0.2, -0.2, 0.7),
param_sw = c(-3, 0.4, 0.4, -0.3, 0.3, 0.8),
param_haz = c(0.1, log(0.8), log(1.1), log(1.1),
log(1.4), log(1.4), log(1.3), log(1), log(0.9)),
param_cens = NULL,
param_select = c(0, 0.5, 0.25),
lagk = TRUE,
true_hr = FALSE
)
# Convert to the required treatment switch indicator triplet
dat <- sim_obj$dat_long
dat$rand <- stats::ave(dat$A, dat$id, FUN = function(x) rep(x[1], length(x)))
dat$cross <- dat$S_CE
dat$subseq <- pmax(dat$S_ES, dat$S_CS)
fit <- multimsm(
dat_long = dat, id = "id", tstart = "t.start", tstop = "t.stop",
event = "event", rand = "rand", cross = "cross", subseq = "subseq",
base_cov = c("L1", "L3"),
iptw_num = ~ regime_lag + factor(visit) + L1 + L3,
iptw_den = ~ regime_lag + factor(visit) + L1 + L3 + X + U + Alag1,
wt_trunc = 0.95
)
fit
Print a fitted multi-regime marginal structural Cox model
Description
Print a fitted multi-regime marginal structural Cox model
Usage
## S3 method for class 'multimsm'
print(x, digits = 3, ...)
Arguments
x |
An object returned by |
digits |
Number of decimal places for coefficient and weight summaries. |
... |
Currently unused. |
Value
Invisibly returns x.
Simulate longitudinal survival data with three-way treatment switching
Description
simswc() simulates a two-arm randomized clinical-trial-like longitudinal
survival dataset with a three-way post-randomization treatment swich
process. It is intended to generate data for evaluating and demonstrating
multi-regime marginal structural Cox model multimsm().
Follow-up is represented on a discrete visit grid
k = 0, 1, \ldots, K - 1, where K = n_visit. Each subject is
randomized at baseline to experimental treatment A_0 = 1 or control
treatment A_0 = 0. During follow-up, subjects may experience at most
one absorbing treatment switch:
-
ES: randomized experimental\rightarrowsubsequent therapy; -
CE: randomized control\rightarrowexperimental crossover; -
CS: randomized control\rightarrowsubsequent therapy.
The simulator generates baseline covariates, two time-varying confounders
(X and U), a generic switch process V, switch-type indicators
S_ES, S_CE, and S_CS, optional random censoring, and a survival outcome
from a piecewise exponential hazard model. The output includes both
subject-level wide-format data and counting-process long-format data suitable
for Cox modeling and for the package's multi-regime MSM workflow.
Usage
simswc(n, n_visit, base_cov = paste0("L", 1:6), trt_prob = 0.5,
param_tdcov, param_tdcov2, param_sw, param_haz, param_cens = NULL,
study_end = n_visit, lagk = FALSE, param_select = NULL,
p_pick_CE = 0.5, true_hr = TRUE, truth_n = 1e+05)
Arguments
n |
Integer sample size. The current implementation requires an integer greater than 49. |
n_visit |
Integer number of planned discrete visits, |
base_cov |
Character vector specifying which baseline covariates are
included in the simulator's linear predictors and used to define strata for
baseline randomization. Must be a non-empty subset of
|
trt_prob |
Numeric scalar in |
param_tdcov |
Numeric vector of length
|
param_tdcov2 |
Numeric vector of length
|
param_sw |
Numeric vector of length
|
param_haz |
Numeric vector of length
|
param_cens |
Optional numeric vector of length
|
study_end |
Numeric scalar giving the nominal administrative study-end
time. Must be between |
lagk |
Logical. If |
param_select |
Optional numeric vector of length 3 controlling the
If |
p_pick_CE |
Numeric scalar in |
true_hr |
Logical. If |
truth_n |
Non-negative integer Monte Carlo sample size used only when
|
Details
Let L_i denote the vector of selected baseline covariates named in
base_cov. The simulator first generates six possible baseline covariates:
-
L1: Binary age group indicator,L_1 \sim \mathrm{Bernoulli}(0.3). -
L2: Three-level risk group taking values0,1, and2with probabilities0.60,0.35, and0.05, respectively. -
L3: Binary baseline metastasis/progression-like indicator,L_3 \sim \mathrm{Bernoulli}(0.1). -
L4: Continuous tumor-size-like marker,L_4 \sim N(60, 35^2). -
L5: Binary biomarker indicator,L_5 \sim \mathrm{Bernoulli}(0.8). -
L6: Binary prior-treatment indicator,L_6 \sim \mathrm{Bernoulli}(0.6).
Only the variables included in base_cov are used in the treatment,
confounder, switch, censoring, and hazard linear predictors, and only these
selected baseline covariates are returned in the simulated datasets.
Baseline treatment is assigned by stratified randomization within the
interaction strata formed by base_cov; within each stratum, approximately
trt_prob of subjects are assigned to A_0 = 1. Because base_cov
defines both the model covariate vector and the randomization strata, using a
continuous variable such as L4 may create many small strata. In practical
simulation studies, it is often preferable to stratify on categorical or binary
baseline covariates, for example base_cov = c("L1", "L3").
The time-varying binary confounder X is initialized as X_0 = 0 and is
made absorbing by construction. For k = 1, \ldots, K - 1,
\Pr(\widetilde X_{ik}=1 \mid H_{i,k-1}) =
\operatorname{expit}\{\eta_0 + \eta_L^\top L_i +
\eta_A A_{i,k-1} + \eta_X X_{i,k-1}\},
and the stored value is
X_{ik} = \max(X_{i,k-1}, \widetilde X_{ik}).
The continuous time-varying confounder U is initialized as
U_0 \sim N(0,1). For k = 1, \ldots, K - 1,
U_{ik} \sim N(\mu_{ik}, 1), \quad
\mu_{ik} =
\alpha_0 + \alpha_L^\top L_i +
\alpha_A A_{i,k-1} + \alpha_U U_{i,k-1}.
The generic switch indicator V is initialized as V_0 = 0. Among
subjects who have not yet switched, the probability of a new switch at visit
k is
\Pr(V_{ik}=1 \mid V_{i,k-1}=0, H_{ik}) =
\operatorname{expit}\{\delta_0 + \delta_L^\top L_i +
\delta_A A_{i,k-1} + \delta_X X_{ik} + \delta_U U_{ik}\}.
Once a subject switches, V remains equal to 1. The switch destination is then
assigned as follows:
if
A_{i0}=1, the subject switches toES;S_ESis set to 1 from that visit onward andAis set to 0 thereafter to represent being off the original experimental-treatment indicator and on subsequent therapy;if
A_{i0}=0, the subject switches either toCEor toCS; underCE,S_CEis set to 1 andAis set to 1 thereafter; underCS,S_CSis set to 1 andAremains 0 thereafter.
If param_select = NULL, the control-arm switch destination is chosen with
fixed probability p_pick_CE for CE and 1 - p_pick_CE for CS. If
param_select is supplied, the probability of CE among newly switching
control-arm subjects is
\Pr(\mathrm{CE}_{ik}=1 \mid \mathrm{new\ switch}, A_{i0}=0) =
\operatorname{expit}\{\xi_0 + \xi_X X_{ik} + \xi_U U_{ik}\}.
The event time is generated from a piecewise exponential model. For interval
[k, k+1), where k = 0,\ldots,K-1, the internal column index is
m = k + 1, and the interval-specific hazard is
\lambda_{ik} =
\{\lambda_0 \exp(0.1m)\}
\exp\{\beta_A A_{ik} + \beta_L^\top L_i +
\beta_X X_{ik} + \beta_U U_{ik} +
\beta_{ES} S^{ES}_{ik} +
\beta_{CE} S^{CE}_{ik} +
\beta_{CS} S^{CS}_{ik}\}.
Conditional on the interval covariates, an exponential waiting time with rate
\lambda_{ik} is drawn. If the waiting time is less than 1, the event is
placed inside that interval at exact continuous time k + T^\ast;
otherwise simulation proceeds to the next interval.
If param_cens is supplied, random right censoring is generated in discrete
time. At the end of interval k, among subjects not yet randomly
censored,
\Pr(C_{i,k+1}=1 \mid C_{ik}=0, H_{ik}) =
\operatorname{expit}\{\gamma_0 + \gamma_L^\top L_i +
\gamma_A A_{ik} + \gamma_X X_{ik} + \gamma_U U_{ik}\}.
If param_cens = NULL, no additional random censoring is generated. In all
cases, the simulator also applies an administrative censoring time
C.a = study_end - T_e, where T_e is sampled uniformly from the integers
0:floor(n_visit / 3). The observed time is the minimum of event time,
random censoring time, and administrative censoring time.
The long-format output uses start-stop counting-process intervals. The
terminal event time is kept as a continuous time when the event occurs inside
an interval. If lagk = TRUE, the long-format data also include
Alag1-Alag3, Xlag1-Xlag3, Ulag1-Ulag3, and one-step leads
Xnext1 and Unext1.
For the revised triplet-input multimsm() interface, the simulation output
can be converted by defining rand as the subject-level baseline treatment
A.0, cross = S_CE, and subseq = pmax(S_ES, S_CS).
Value
A list with components:
-
dat: Subject-level wide-format data. It containsid, selected baseline covariates, visit-indexed matrices expanded into columns forX,U,A,V,S_ES,S_CE, andS_CS, observed event/censoring information (T.obs,D.obs), observed switch timeT.w, administrative censoring timeC.a, and observed switch typetype_sw(1 = ES,2 = CE,3 = CS,NA = no observed switch before exit). -
dat_long: Counting-process long-format data derived fromdat, with one row per subject-interval while the subject is under observation. It includest.start,t.stop,event,C, selected baseline covariates,A,V,X,U, andS_ES/S_CE/S_CS. Iflagk = TRUE, it also includes the lag/lead variables described above. -
C: Annbyn_visit + 1matrix containing the cumulative censoring process after combining random and administrative censoring. -
stats: Named summary statistics:init_trt, the proportion randomized to experimental treatment;switched, the proportion with an observed switch before exit;prop_ES,prop_CE, andprop_CS, the proportions of switch types among switchers; andprop_event, the observed event proportion. -
true_sdiff: Iftrue_hr = TRUEandtruth_n > 0, a list containing an approximate sustained-regime benchmark, including a fittedsurvival::survfitobject, time gridt, survival curvessurv0andsurv1, an approximate Cox hazard ratiocHR, survival differencesurv_diff, and the simulated benchmark datasetsdat_A0anddat_A1. OtherwiseNULL.
References
Keogh RH, Seaman SR, Gran JM, Vansteelandt S. Simulating longitudinal data from marginal structural models using the additive hazard model. Biometrical Journal. 2021;63:1526-1541.
Examples
set.seed(123)
sim_obj <- simswc(
n = 200,
n_visit = 8,
base_cov = c("L1", "L3"),
trt_prob = 0.5,
param_tdcov = c(-1.5, 0.3, 0.3, -0.2, 0.5),
param_tdcov2 = c( 0.05, 0.2, 0.2, -0.2, 0.7),
param_sw = c(-3.0, 0.4, 0.4, -0.3, 0.3, 0.8),
param_haz = c(0.1, log(0.8), log(1.1), log(1.1),
log(1.4), log(1.4), log(1.3), log(1), log(0.9)),
param_cens = NULL,
param_select = c(0, 0.5, 0.25),
lagk = TRUE,
true_hr = FALSE
)
head(sim_obj$dat_long)
sim_obj$stats
## Convert to the triplet interface used by multimsm().
dat <- sim_obj$dat_long
dat$rand <- stats::ave(dat$A, dat$id, FUN = function(x) rep(x[1], length(x)))
dat$cross <- dat$S_CE
dat$subseq <- pmax(dat$S_ES, dat$S_CS)
table(dat$rand, dat$cross, dat$subseq)
Time coarsening and follow-up augmentation for start-stop survival data
Description
tcoarsen() harmonizes irregular longitudinal survival data in start-stop
form onto a user-specified discrete time grid. It maps observed update times
to grid landmarks using either floor or ceiling coarsening, rebuilds start-stop
intervals, and augments each individual's follow-up record by carrying the
last known covariate and treatment values forward to empty grid intervals.
The function is intended as a transparent preprocessing utility for real-world clinical trial datasets with sparse or irregular follow-up times.The grid width and coarsening direction are substantive analysis choices and should usually be prespecified and examined in sensitivity analysis.
For use before multimsm(), include the triplet variables such as rand,
cross, and subseq in covs, and set absorb_vars = c("cross", "subseq")
when crossover and subsequent therapy indicators should be interpreted as
absorbing current-status variables. The output remains an ordinary data frame
that can be passed directly to downstream modeling functions.
Usage
tcoarsen(data, id, start, stop, event, covs = NULL, bin_width,
dir_coarsen = c("floor", "ceiling"), origin = NULL, absorb_vars = NULL,
keep_terminal_time = TRUE, gap_action = c("stop", "warn", "ignore"),
add_visit = TRUE, visit_name = "visit", diagnostics = TRUE,
verbose = TRUE)
Arguments
data |
A |
id |
A character string for the subject identifier variable name. |
start |
A character string naming the interval start-time variable. |
stop |
A character string for the interval stop-time variable name. |
event |
A character string for the terminal event indicator.
The event variable must be coded |
covs |
Optional character vector of treatment or covariate variables
to document as carried-forward predictors. Both baseline-only and
time-varying variables are allowed. The function retains all original data
columns, but |
bin_width |
A positive numeric value giving the grid width in the
same time units as |
dir_coarsen |
Coarsening direction, either |
origin |
Optional numeric grid origin. If |
absorb_vars |
Optional character vector of variables to convert to
absorbing status using within-subject cumulative maxima after sorting by
time. These variables must be coded |
keep_terminal_time |
Logical. If |
gap_action |
How to handle gaps between adjacent within-subject
start-stop intervals before coarsening. One of |
add_visit |
Logical. If |
visit_name |
A single character string giving the name of the visit-index
variable to add when |
diagnostics |
Logical. If |
verbose |
Logical. If |
Details
Suppose an individual has observed rows indexed by j, representing intervals
[start_ij, stop_ij) over which the row values are assumed to apply. Given
grid origin origin and bin width bin_width, an observed update time s is
mapped to
-
origin + floor((s - origin) / bin_width) * bin_widthwhendir_coarsen = "floor"; -
origin + ceiling((s - origin) / bin_width) * bin_widthwhendir_coarsen = "ceiling".
After snapping update times, tcoarsen() inserts one row for each missing grid
interval during which the subject remains under observation. Values of all
carried variables are filled by last-known-value-carried-forward (LKCF).
Terminal event/censoring times are preserved exactly by default, so an event
at day 173 can remain stop = 173 even if the coarsening grid is monthly.
If multiple observed rows map to the same coarsened update time, the last observed row within that coarsened bin is kept. This rule is simple and reproducible, but it necessarily discards within-bin ordering information.
Value
A tcoarsen object with two components. The first component is a data.frame
called dat_coarsen the time-coarsened input data with updated interval times
and time-varying covariates under coarsened time grid; the second component is
diagnostics for diagnostics settings and basic diagnostics.
References
Guerra SF, Schnitzer ME, Forget A, Blais L. Impact of discretization of the timeline for longitudinal causal inference methods. Statistics in Medicine. 2020 Sept;39(27):0277-6715.
See Also
Examples
if (requireNamespace("survival", quietly = TRUE)) {
data("heart", package = "survival")
heart_30 <- tcoarsen(
data = heart,
id = "id",
start = "start",
stop = "stop",
event = "event",
covs = c("transplant", "age", "surgery"),
bin_width = 30,
dir_coarsen = "floor",
origin = 0,
absorb_vars = "transplant",
verbose = FALSE
)
utils::head(heart_30$dat_coarsen, 10)
}