gcomputation: an R Package for Estimating Marginal Effects Using G-Computation
================
The R package ‘gcomputation’ provides functions to compute G-Computation (GC) to estimate marginal effects. It has estimating marginal functions for binary, time-to-event, continuous and count outcomes regarding two exposures. The package implements GC with various working models or algorithms, referred to as Q-models.
gc_binary, gc_times,
gc_continuous and gc_count are the main
functions, implementing GC to estimate marginal functions for binary,
time-to-event, continuous and count outcomes regarding two exposures
using a variety of modeling strategies.
The package supports several methods to construct the Q-model: *
“all”: Uses a usual logistic or Cox model, incorporating all variables
provided in the formula. * “lasso”: Implements L1 regularization
regression, which can perform predictor selection. It uses the glmnet
package. * “ridge”: Applies L2 regularization regression, also utilizing
the glmnet package. This is equivalent to Elastic Net with an alpha
value of 0. * “elasticnet”: Combines both L1 and L2 regularizations
regression, also using the glmnet package. The alpha parameter controls
the mix between L1 and L2, typically ranging from 0 to 1. * “aic”:
Performs forward selection regression based on the Akaike Information
Criterion (AIC), using stepAIC. * “bic”: Performs forward selection
regression based on the Bayesian Information Criterion (BIC), also using
stepAIC with k=log(nrow(data)).
The package offers estimation of three types of marginal effects: * “ATE” (Average Treatment effect on the entire population): The marginal effect if the entire sample were treated versus entirely untreated. * “ATT” (Average Treatment effect on the treated): The marginal effect if the treated patients (group = 1) would have been untreated. * “ATU” (Average Treatment effect on the untreated): The marginal effect if the untreated patients (group = 0) would have been treated.
S3 methods are included for objects generated by gc_logistic and
gc_survival functions, allowing for: * print: To print a
summary of the results. * summary: To provide a more
detailed summary of the prognostic capacities, including confidence
intervals. * plot: To visualize the results through
calibration plots or effect-specific plots (proportion for logistic,
survival curve for survival).
Other Exported Functions and Data: * transport: Applies
an already fitted GC model (an object of class gcbinary,
gctimes, gccontinuous or gccount)
to a newdata set to estimate marginal effects in a new
population. * Datasets: Includes the dataPROPHYVAP (simulated randomized
clinical trial data) and dataCOHORT (simulated observational cohort
data). * Multiple Imputations (MI-BOOT): Support for the MI-BOOT
approach is added using the boot.mi argument, integrating
the mice package for handling missing data.
The package also supports bootstrapping for confidence interval estimation, with options for “bcv” (default) or “boot” types and a default of 500 bootstrap resamples. Users can also control whether tuning parameters are estimated within each bootstrap iteration or on the total population.
For a binary outcome:
data("dataPROPHYVAP")
.f <- formula(VAP ~ GROUP * (AGE + SEX + BMI + DIABETES))
# 1. Standard execution
# boot.tune=TRUE estimates tuning parameters inside each bootstrap iteration.
gc_bin <- gc_binary(formula=.f, model="ridge", data=dataPROPHYVAP, group="GROUP",
cv=10, boot.type="bcv", boot.number=500, boot.tune=TRUE,
effect="ATE", progress=TRUE, seed=5192)
gc_bin
# Summary specifying Asymptotic CIs ("norm")
summary(gc_bin, ci.type="norm")
# Calibration plot
plot(gc_bin)
# 2. Execution with multiple imputation
# Uses boot.mi=TRUE and m=5. boot.tune=FALSE to only estimate it once on the complete data set
.f_mi <- formula(VAP ~ GROUP * (AGE + SEX + BMI + DIABETES + GLASGOW + INJURY))
gc_mi <- gc_binary(formula=.f_mi, model="elasticnet", data=dataPROPHYVAP,
group="GROUP", cv=10, boot.type="bcv", boot.number=500, boot.tune=FALSE,
effect="ATE", progress=TRUE, seed=8051, boot.mi=TRUE, m=5)
# Plotting the calibration curve, smoothed across m imputations
plot(gc_mi, smooth=TRUE)
# Summary specifying Non-parametric CIs ("perc")
summary(gc_mi, ci.type="perc")
# 3. Transportability
# Define a new dataset (e.g., a subset of younger patients, AGE<=50)
newdata_binary <- subset(dataPROPHYVAP, AGE<=50)
# Transport the fitted gc_bin model to the new dataset
gc_transport <- transport(object=gc_bin, newdata=newdata_binary,
boot.number=500)
summary(gc_transport, ci.type="norm")For a survival outcome:
data(dataPROPHYVAP)
.ft <- formula(Surv(TIME_DEATH, DEATH) ~ GROUP * (AGE + BMI + GLASGOW + LEUKO))
gc_surv <- gc_times(formula=.ft, model="lasso", data=dataPROPHYVAP, group="GROUP",
param.tune=0.03, boot.type="bcv", boot.number=500, boot.tune=FALSE,
effect="ATE", pro.time=30, seed=5312)
gc_surv
summary(gc_surv, ci.type="perc")
#Predicted survival curves for treatment groups
plot(gc_surv, method="survival", col=c("red3","blue3"))
legend("bottomleft", c("Placebo", "Ceftriaxone"), col=c("red3","blue3"), lty=1)To install the version from GitHub:
remotes::install_github("chupverse/gcomputation")You can report any issues at this link.