| Title: | Simulated Pseudo-Individual Data Meta-Analysis with ABC-SMC |
| Version: | 0.2.0 |
| Description: | Meta-analysis via Approximate Bayesian Computation Sequential Monte Carlo (ABC-SMC) by simulating pseudo-individual data from published group-level summary statistics. Handles binary, continuous, and generic effect-size outcomes within a one-stage mixed-model framework. Supports subgroup analysis. |
| License: | MIT + file LICENSE |
| Depends: | R (≥ 3.5.0) |
| Imports: | lme4, parallel, stats, methods, Rcpp |
| Suggests: | ggplot2, testthat, knitr, rmarkdown |
| LinkingTo: | Rcpp, RcppArmadillo |
| SystemRequirements: | GNU make |
| URL: | https://github.com/HaichuanYu0703/SPIMA |
| BugReports: | https://github.com/HaichuanYu0703/SPIMA/issues |
| Encoding: | UTF-8 |
| LazyData: | true |
| NeedsCompilation: | yes |
| Config/roxygen2/version: | 8.0.0 |
| Packaged: | 2026-05-21 09:13:32 UTC; yu-ha |
| Author: | Yu Haichuan [aut, cre, cph] |
| Maintainer: | Yu Haichuan <yuhaichuan@whu.edu.cn> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-28 11:30:01 UTC |
Fit mixed-effects logistic regression on aggregate data
Description
Fits glmer(cbind(event, n-event) ~ T * mean_X + (1|study)) on the aggregate data. This is an ecological regression — coefficients are attenuated toward zero, but the interaction test is valid (conservative).
Usage
.int_fit_aggregate(data, input_spec, study_col, ...)
Value
list(beta, gamma, coefficients, converged, fit, tau, gamma_for_gen, beta_for_gen)
Generate pseudo-IPD and refit glmer for sensitivity analysis
Description
Uses gamma and beta from the aggregate model to generate pseudo-IPD, then fits an individual-level glmer.
Usage
.int_pseudo_sensitivity(data, input_spec, study_col, gamma, beta, rho, ...)
Worker function for PSOCK cluster (package-level, avoids .GlobalEnv capture)
Description
Worker function for PSOCK cluster (package-level, avoids .GlobalEnv capture)
Usage
.run_psock_worker(i, theta, sim_fn, distance_fn, obs_stats, extra_args)
Arguments
i |
Particle index (row of |
theta |
Matrix of parameter vectors, one row per particle. |
sim_fn |
Simulation function |
distance_fn |
Distance function |
obs_stats |
Observed summary statistics. |
extra_args |
List of additional arguments passed to |
Value
A scalar distance, or Inf if simulation fails.
Run multiple simulations, optionally in parallel
Description
Run multiple simulations, optionally in parallel
Usage
.run_simulations(theta, sim_fn, distance_fn, obs_stats, ctrl, ...)
Arguments
theta |
Matrix of parameter vectors (one row per particle). |
sim_fn |
Simulation function |
distance_fn |
Distance function |
obs_stats |
Observed summary statistics. |
ctrl |
An |
... |
Additional arguments passed to |
Value
Numeric vector of distances, one per particle.
Blood Pressure Continuous Outcome Data
Description
A dataset of study-level summary statistics for continuous outcomes (blood pressure) from multiple clinical trials. Contains mean, standard deviation, and sample size per arm, suitable for the continuous module.
Usage
bp_cont
Format
A data frame with columns:
- study
Study identifier.
- group
Treatment group indicator (0 = control, 1 = treatment).
- n
Sample size per arm.
- mean
Mean blood pressure.
- sd
Standard deviation of blood pressure.
Diagnose Skewness from a Five-Number Summary
Description
Compares the right-side spread (Q3-median, max-median) to the left-side spread (median-Q1, median-min) to infer the skew direction.
Usage
diagnose_skewness(q1, median, q3, min, max)
Arguments
q1 |
First quartile. |
median |
Median. |
q3 |
Third quartile. |
min |
Minimum value. |
max |
Maximum value. |
Details
This function is for **informational purposes only** and does not affect the data-generation behaviour of the package.
Value
Character string: "symmetric", "right_skewed", or
"left_skewed".
Distance Functions for ABC-SMC
Description
Each module exports a distance function that compares simulated summary statistics to the observed summary statistics.
Usage
spima_bin_distance(sim_stats, obs_stats)
spima_cont_distance(sim_stats, obs_stats)
Arguments
sim_stats |
Simulated summary statistics (vector or list). |
obs_stats |
Observed summary statistics (same structure). |
Value
A non-negative scalar distance.
Functions
-
spima_bin_distance(): Binary outcome: Euclidean distance on (possibly weighted) log-odds scale. -
spima_cont_distance(): Continuous outcome: inverse-variance weighted Euclidean distance on study-level mean differences.
Extract Posterior Treatment Effects for Interaction Visualization
Description
Internal helper that extracts model coefficients and their uncertainty
from a spima_int object, then computes the predicted treatment
effect (absolute risk difference or risk ratio) across a range of
covariate values. Uncertainty is propagated by sampling from the
multivariate normal approximation of the fixed effects.
Usage
extract_interaction_posterior(
object,
covariate = NULL,
at = NULL,
scale = c("absolute", "relative"),
ci_level = 0.95,
n_draws = 2000
)
Arguments
object |
A |
covariate |
Character; covariate name to visualise. If |
at |
Numeric vector of covariate values at which to evaluate the
treatment effect. If |
scale |
|
ci_level |
Confidence level (default 0.95). |
n_draws |
Number of multivariate-normal draws (default 2000). |
Value
A list with components:
atCovariate values (length
n_at).drawsMatrix of posterior treatment effects (
n_drawsrows xn_atcolumns).summaryData frame with columns
estimate,ci_lower,ci_upper.scale"absolute"or"relative".model_type"pseudo-IPD"or"aggregate".
Generic forest plot
Description
Draws a forest plot showing study-level effect estimates with 95% CIs and the SPI-MA pooled posterior estimate.
Usage
forest(x, ...)
## S3 method for class 'spima'
forest(
x,
log_scale = FALSE,
study_labels = NULL,
col = "grey40",
pooled_col = "#2166AC",
xlab = NULL,
...
)
spima_forest(x, ...)
Arguments
x |
A |
... |
Additional arguments passed to |
log_scale |
If |
study_labels |
Optional character vector of study labels. |
col |
Color for study-level points and CIs. |
pooled_col |
Color for the pooled diamond. |
xlab |
X-axis label (auto-detected if NULL). |
Value
A plot object; see the method documentation for details.
A ggplot object (invisibly) if ggplot2 is
available, or NULL with a plot drawn to the active device.
Generic Effect Size Data
Description
A dataset of study-level summary statistics for generic (continuous) effect sizes. Contains effect size estimates and their standard errors, suitable for the generic module.
Usage
gen_effect
Format
A data frame with columns:
- study
Study identifier.
- yi
Effect size estimate.
- sei
Standard error of the effect size.
Kidney Disease Binary Outcome Data
Description
A dataset of study-level summary statistics for binary outcomes (kidney disease) from multiple clinical trials. Contains event counts and sample sizes per arm, suitable for the binary module.
Usage
kidney_bin
Format
A data frame with columns:
- study
Study identifier.
- group
Treatment group indicator (0 = control, 1 = treatment).
- n
Sample size per arm.
- event
Number of events per arm.
Evaluate log-prior density for a parameter vector
Description
Evaluate log-prior density for a parameter vector
Usage
log_prior_density(theta, prior_obj)
Arguments
theta |
Named numeric vector of parameters. |
prior_obj |
A |
Value
Log-density value (summed across independent priors).
Plot Treatment Effect Modification
Description
Generates a plot showing how the predicted treatment effect (absolute risk
difference or risk ratio) varies across the range of a continuous
covariate, based on interaction estimates from spima_int.
Usage
## S3 method for class 'spima_int'
plot(
x,
covariate = NULL,
ci_level = 0.95,
at = NULL,
scale = c("absolute", "relative"),
...
)
Arguments
x |
A |
covariate |
Character; name of the covariate to plot. If
|
ci_level |
Confidence level for the uncertainty band (default 0.95). |
at |
Numeric vector of covariate values at which to evaluate the
treatment effect. If |
scale |
|
... |
Additional arguments (ignored). |
Details
The underlying model is either the pseudo-IPD individual-level GLMM (preferred) or the aggregate ecological GLMM (fallback). Uncertainty is propagated by sampling from the multivariate normal approximation of the fixed effects.
Value
A ggplot object.
Examples
res <- spima_int(data, input_spec)
plot(res, covariate = "X1", scale = "absolute")
Define Prior Distributions for ABC-SMC Parameters
Description
Define Prior Distributions for ABC-SMC Parameters
Usage
prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)", ...)
Arguments
mu |
Prior specification for overall effect |
tau |
Prior specification for heterogeneity |
... |
Additional named priors (e.g. |
Value
A list of class spima_prior with elements name,
pars, rfun (random generation), dfun (density),
and default.
Examples
prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)")
Resolve the study column: if input_spec has "study", use it; else if data has a "study" column, use it; else assign row IDs.
Description
Resolve the study column: if input_spec has "study", use it; else if data has a "study" column, use it; else assign row IDs.
Usage
resolve_study_col(data, input_spec)
Arguments
data |
A data frame with study-level summary statistics. |
input_spec |
A named list mapping column names to roles; may
contain |
Value
A list with components data, col (resolved
column name), and ids (unique study identifiers).
Run ABC-SMC Inference
Description
Run ABC-SMC Inference
Usage
run_abc_smc(prior_obj, sim_fn, distance_fn, obs_stats, ctrl, ...)
Arguments
prior_obj |
A |
sim_fn |
Simulation function: |
distance_fn |
Distance function: |
obs_stats |
Observed (target) summary statistics. |
ctrl |
An |
... |
Additional arguments passed to |
Value
A list of class spima_abc containing posterior samples,
weights, diagnostics, and generation records.
Sample from the joint prior
Description
Sample from the joint prior
Usage
sample_prior(n, prior_obj)
Arguments
n |
Number of samples. |
prior_obj |
A |
Value
A matrix with n rows and one column per prior.
Approximate Standard Error of the Median from IQR
Description
For a normal distribution, IQR ~ 1.35 * sigma and Var(median) ~ (pi/2) * sigma^2 / n.
Usage
se_from_iqr(iqr, n)
Arguments
iqr |
Interquartile range (Q3 - Q1). |
n |
Sample size. |
Value
Standard error of the median (scalar).
Approximate Standard Error of the Median from Range
Description
Uses the Wan et al. (2014) formula to estimate sigma from the range, then computes SE(median) = sqrt(pi/2 * sigma^2 / n).
Usage
se_from_range(range_val, n)
Arguments
range_val |
Observed range (max - min). |
n |
Sample size. |
Details
Reference: Wan et al., BMC Medical Research Methodology 2014, 14:135.
Value
Standard error of the median (scalar), or NA if n < 5.
Control Parameters for ABC-SMC
Description
Control Parameters for ABC-SMC
Usage
smc_control(
n_particles = 2000,
n_particles_max = 10000,
n_generations = 10,
epsilon_init = NULL,
epsilon_decay = 0.85,
ess_min = 0.3,
kernel = "gaussian",
accept_rate_target = 0.2,
verbose = TRUE,
parallel = FALSE,
n_cores = NULL
)
Arguments
n_particles |
Number of particles (simulations) per generation. |
n_particles_max |
Maximum number of particles for adaptive doubling. |
n_generations |
Maximum number of SMC generations. |
epsilon_init |
Initial acceptance threshold. If |
epsilon_decay |
Multiplicative factor applied to epsilon each generation (0 < decay < 1). |
ess_min |
Minimum effective-sample-size ratio (relative to
|
kernel |
Perturbation kernel type: |
accept_rate_target |
Target acceptance rate used for adaptive epsilon tuning. |
verbose |
Print progress information? |
parallel |
Logical; if |
n_cores |
Number of CPU cores for parallel execution. If
|
Value
A list of class smc_control.
Examples
smc_control(n_particles = 500, n_generations = 8)
spima: Simulated Pseudo-Individual Data Meta-Analysis
Description
The main entry point. Dispatches to the appropriate module based on
outcome_type and runs ABC-SMC for meta-analytic inference.
Usage
spima(
data,
outcome_type = c("binary", "continuous", "generic"),
input_spec,
prior,
smc_control,
parallel = FALSE,
subgroup = NULL,
family = c("gaussian", "Gamma"),
...
)
Arguments
data |
A data frame of study-level summary statistics; one row per
study (or per study-arm when |
outcome_type |
Outcome type: |
input_spec |
A named list mapping column names to roles. The
required entries depend on
|
prior |
A |
smc_control |
An |
parallel |
Logical; if |
subgroup |
Optional column name for subgroup analysis. When specified, the analysis is run separately for each level of this variable. |
family |
Distributional family for the pseudo-IPD likelihood.
Only used when |
... |
Additional arguments passed to module functions. |
Value
A spima object with components:
call |
The matched call. |
outcome_type |
The outcome type. |
abc_result |
Full ABC-SMC output (generations, posterior, etc.). |
data |
The input data. |
input_spec |
The column mapping. |
Examples
# Quick demo (runs in < 5 seconds)
data_bin <- data.frame(
study = 1:4,
event = c(30, 45, 28, 32),
n = c(100, 100, 80, 80)
)
res <- spima(data_bin, "binary",
input_spec = list(study = "study", event = "event", n = "n"),
prior = prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)"),
smc_control = smc_control(n_particles = 100, n_generations = 2))
print(res)
summary(res)
# Full analysis (two-arm per study)
data_bin2 <- data.frame(
study = 1:4,
group = c(0, 1, 0, 1, 0, 1, 0, 1),
event = c(30, 45, 28, 32, 40, 58, 18, 22),
n = c(100, 100, 80, 80, 120, 120, 60, 60)
)
res2 <- spima(data_bin2, "binary",
input_spec = list(study = "study", event = "event",
n = "n", group = "group"),
prior = prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)"),
smc_control = smc_control(n_particles = 500, n_generations = 5))
Analyze Pseudo-IPD for Binary Outcome
Description
Computes per-study log odds ratios from the simulated pseudo-IPD by
constructing 2x2 tables. Also fits a one-stage logistic mixed model
(glmer) for the overall treatment effect estimate.
Usage
spima_bin_analyze(pseudo_ipd, input_spec)
Arguments
pseudo_ipd |
A data frame from |
input_spec |
Column mapping. |
Value
A list with estimates (mixed-model fixed effects),
summary_stats (named vector of per-study log ORs, matching
the format from spima_bin_observed_stats), and
converged (logical).
Compute Observed Summary Statistics for Binary Data
Description
Compute Observed Summary Statistics for Binary Data
Usage
spima_bin_observed_stats(data, input_spec)
Arguments
data |
Original data frame per blueprint. |
input_spec |
Column mapping. |
Value
Named vector of observed log-ORs (or log-odds) with optional inverse-variance weights as attribute.
Simulate Pseudo-IPD for Binary Outcome
Description
For each study, the control-group proportion is taken from observed data, and the treatment-group log-odds are shifted by a study-specific effect drawn from N(mu, tau^2). Individual Bernoulli outcomes are then generated.
Usage
spima_bin_simulate(study_spec, params, input_spec)
Arguments
study_spec |
A data frame (subset for one study) containing the observed counts. |
params |
Named vector |
input_spec |
Column mapping (passed through from |
Value
A data frame with columns study, group, y.
Validate Binary Outcome Input
Description
Validate Binary Outcome Input
Usage
spima_bin_validate(data, input_spec)
Arguments
data |
A data frame with columns for events and sample sizes. |
input_spec |
A named list specifying column mappings, e.g.
|
Value
TRUE invisibly; stops with a message on failure.
Analyze Pseudo-IPD for Continuous Outcome
Description
Computes per-study mean differences from the simulated pseudo-IPD.
Also attempts a linear mixed model (lmer) for overall estimate.
Usage
spima_cont_analyze(pseudo_ipd, input_spec)
Arguments
pseudo_ipd |
A data frame from |
input_spec |
Column mapping. |
Value
A list with estimates, summary_stats (per-study
mean differences and pooled SDs, matching observed_stats format),
and converged.
Compute Observed Summary Statistics for Continuous Data
Description
Compute Observed Summary Statistics for Continuous Data
Usage
spima_cont_observed_stats(data, input_spec)
Arguments
data |
Original data frame. |
input_spec |
Column mapping. |
Value
A list with means and sds (named vectors).
Analyze Pseudo-IPD for Quantile-Format Data
Description
Computes per-study median differences (two-group) or per-study medians
(one-group) from simulated pseudo-IPD. Returns a flat named vector of
effect sizes matching the output of
spima_cont_quantile_observed_stats() for use with
spima_generic_distance().
Usage
spima_cont_quantile_analyze(pseudo_ipd, input_spec)
Arguments
pseudo_ipd |
Data frame with columns |
input_spec |
The original column mapping (only |
Value
A list with components estimates (NULL),
summary_stats (named numeric vector), converged (TRUE),
and fit (NULL).
Compute Observed Effect Sizes from Quantile-Format Data
Description
Converts median/IQR, median/range, or five-number summary data into
per-study effect sizes (median differences) with inverse-variance
weights, matching the spima_generic_observed_stats output format.
Usage
spima_cont_quantile_observed_stats(data, input_spec)
Arguments
data |
Original data frame with arm-level quantiles. |
input_spec |
Column mapping (must contain |
Value
A named numeric vector of per-study effect sizes with an
attr("weights") attribute.
Simulate Pseudo-IPD for Continuous Outcome
Description
For each study, individual data are drawn from a normal (or skew-normal) distribution matching the observed mean and SD. The treatment group mean is shifted by a study-specific effect drawn from N(mu, tau^2).
Usage
spima_cont_simulate(study_spec, params, input_spec)
Arguments
study_spec |
A data frame (subset for one study) containing the observed counts. |
params |
Named vector |
input_spec |
Column mapping (passed through from |
Value
A data frame with columns study, group, y.
Validate Continuous Outcome Input
Description
Validate Continuous Outcome Input
Usage
spima_cont_validate(data, input_spec)
Arguments
data |
A data frame with means, SDs, and sample sizes. |
input_spec |
A named list, e.g.
|
Value
TRUE invisibly.
Analyze Pseudo-IPD for Gamma Outcome
Description
Fits a one-stage Gamma GLMM via glmer(y ~ group + (1 | study),
family = Gamma(link = "log")) on the simulated pseudo-IPD. Also
returns per-study log-Rate Ratio values for use as summary statistics
in the ABC distance computation.
Usage
spima_gamma_analyze(pseudo_ipd, input_spec, quick = TRUE)
Arguments
pseudo_ipd |
A data frame from |
input_spec |
Column mapping. |
quick |
If |
Details
Use quick = TRUE (default) during ABC-SMC sampling where only
the per-study summary statistics are needed for distance computation.
Set quick = FALSE to additionally fit the full GLMM (useful for
external diagnostics).
Value
A list with components:
estimatesNamed vector of fixed effects from the Gamma GLMM (or
NULLifquick = TRUEor the model does not converge).summary_statsNamed vector of per-study log-RR values.
convergedLogical indicating GLMM convergence (
TRUEwhenquick = TRUE).fitThe
glmerfit object (orNULL).
Distance Function for Gamma Outcome
Description
Weighted Euclidean distance on the per-study log-Rate Ratio vector.
Delegates to spima_generic_distance.
Usage
spima_gamma_distance(sim_stats, obs_stats)
Arguments
sim_stats |
Simulated summary statistics (vector or list). |
obs_stats |
Observed summary statistics (same structure). |
Value
A non-negative scalar distance (lower = better match).
Compute Observed Summary Statistics for Gamma Data
Description
For each study, computes the observed log-Rate Ratio and its delta-method variance for inverse-variance weighting.
Usage
spima_gamma_observed_stats(data, input_spec)
Arguments
data |
Original data frame with arm-level means, SDs, and sample sizes. |
input_spec |
Column mapping. |
Value
A named vector of per-study log-RR values with an attribute
"weights" containing inverse-variance weights.
Simulate Pseudo-IPD for Gamma Outcome
Description
For each study, individual data are drawn from a Gamma distribution
matching the observed mean and SD via method-of-moments. The treatment
group mean is shifted by a multiplicative factor exp(theta_i)
where theta_i ~ N(mu, tau^2) — this encodes the log-Rate Ratio
treatment effect on the original scale. The shape parameter is held
constant within each study, preserving the variance structure implied
by the Gamma GLM with log link.
Usage
spima_gamma_simulate(study_spec, params, input_spec)
Arguments
study_spec |
A data frame (subset for one study) containing the observed counts. |
params |
Named vector |
input_spec |
Column mapping (passed through from |
Value
A data frame with columns study, group, y.
Validate Gamma Outcome Input
Description
Delegates to spima_cont_validate (same data format: mean, sd, n
per arm) and additionally checks that all means are positive (Gamma
distribution is supported on the positive real line).
Usage
spima_gamma_validate(data, input_spec)
Arguments
data |
A data frame with means, SDs, and sample sizes. |
input_spec |
A named list, e.g.
|
Value
TRUE invisibly.
Analyze Pseudo-Data for Generic Effect-Size
Description
For the generic module, the "pseudo-IPD" is already the summary statistics (a vector of effect sizes). This function simply passes them through with converged = TRUE.
Usage
spima_generic_analyze(pseudo_ipd, input_spec)
Arguments
pseudo_ipd |
A data frame from |
input_spec |
Column mapping. |
Value
A list with estimates = NULL,
summary_stats (the effect-size vector), and
converged = TRUE.
Generic (effect-size) distance: weighted Euclidean distance on effects
Description
Weighted Euclidean distance using inverse-variance weights.
Usage
spima_generic_distance(sim_stats, obs_stats)
spima_generic_distance(sim_stats, obs_stats)
Arguments
sim_stats |
Simulated summary statistics (vector or list). |
obs_stats |
Observed summary statistics (same structure). |
Value
A non-negative scalar distance (lower = better match).
Compute Observed Summary Statistics for Generic Effect-Size
Description
Compute Observed Summary Statistics for Generic Effect-Size
Usage
spima_generic_observed_stats(data, input_spec)
Arguments
data |
Original data frame. |
input_spec |
Column mapping. |
Value
Named vector of effect sizes with "weights" attribute
(inverse-variance: 1/sei^2).
Simulate Pseudo-Data for Generic Effect-Size
Description
No individual-level data is generated. Instead, study-level effect sizes are drawn from the random-effects model: theta_i ~ N(mu, tau^2) y_i* ~ N(theta_i, sei_i^2)
Usage
spima_generic_simulate(study_spec, params, input_spec)
Arguments
study_spec |
A data frame (subset for one study) containing the observed counts. |
params |
Named vector |
input_spec |
Column mapping (passed through from |
Details
The returned vector can be treated as "pseudo-IPD" since it directly represents the summary statistics needed for distance computation.
Value
A named numeric vector of simulated effect sizes (one per study).
Validate Generic Effect-Size Input
Description
Validate Generic Effect-Size Input
Usage
spima_generic_validate(data, input_spec)
Arguments
data |
A data frame with effect-size and SE columns. |
input_spec |
A named list, e.g.
|
Value
TRUE invisibly.
SPI-MA Interaction Analysis
Description
Tests whether continuous covariate(s) modify the treatment effect using aggregate data only. The primary method fits a mixed-effects logistic regression on the aggregate data. A sensitivity analysis generates pseudo-IPD and fits an individual-level model.
Usage
spima_int(data, input_spec, rho = 0, ...)
spima_int_validate(data, input_spec)
Arguments
data |
Data frame, one row per study-arm. |
input_spec |
Named list:
|
rho |
Assumed between-covariate correlation for pseudo-IPD generation. Default 0. |
... |
Additional arguments to |
Value
spima_int object.