Model and output customisation in missingHE

Each model fitting function in missingHE provides a series of customisation options to allow a flexible specification of the missingness models in terms of modelling assumptions and output manipulation. Examples include: choice of outcome and prior probability distributions, model structure, missing data assumptions, type of diagnostic and model assessment measures, choice of numeric and graphical posterior summaries. This high level of flexibility aims at facilitating the tasks of the user related to the implementation of the health economic analysis, interpretation of key parameters as well as the need to assess model performance and conduct sensitivity analyses.

This tutorial illustrates how to customise the specification and output of the different types of models that can be fitted in missingHE using the different arguments available inside the functions of the package. For illustration purposes, the built-in data sets called MenSS and PBS will be used in this tutorial as toy examples, which are directly available when installing and loading missingHE in your R workspace. See Introduction to missingHE and Longitudinal models in missingHE for a overview about the different modelling approaches available in missingHE, practical examples on how to implement them using default argument choices and a general presentation of the two data sets.

If you would like to have more information on the package, or would like to point out potential issues in the current version, feel free to contact the maintainer at a.gabrio@maastrichtuniversity.nl. Suggestions on how to improve the package are also very welcome. A development version of the package is also available from the author’s main GitHub repository at https://github.com/AnGabrio/missingHE.

1 Modelling assumptions

The majority of the customisation options about the assumptions and specification of the modelling approaches available in missingHE are common across the four main functions of the package, namely selection (selection models), pattern (pattern mixture models), hurdle (hurdle models) and lmdm (longitudinal missing data model). Among these, key options include:

distributional assumptions about the outcome variables
type of missingness mechanism
specification of the outcome model structure
prior distributions

Other options are instead specific to the type of modelling approach selected and are therefore not applicable to all functions. Among these, key examples include:

specification of the model for the missingness mechanism for each outcome (relevant only for selection and lmdm)
type of identifying restrictions imposed (relevant only for pattern)
specification of the model for the structural value mechanism and type of structural values for each outcome (relevant only for hurdle)
specification of the time dependence structure (relevant only for lmdm)

In the following, a general overview of the choices available in missingHE to specify these modelling assumptions using dedicated arguments from each main function of the package. An application of the methods to the two built-in data sets of the package will be used as a demonstrative example to illustrate the impact of these choices on the analysis results.

2 Outcome distribution

Choice of the distribution for the outcome variables is typically informed based on the characteristics of the observed data by checking whether possible inconsistencies between distributional assumptions and observed data occur. A common example is given by assuming both effectiveness and cost variables to be normally distributed, a typical assumption made in standard statistical analyses of continuous data. Indeed, provided a sufficient sample size is obtained from the study data, the CLT guarantees that the sampling distribution of the sample mean will be normal, regardless of the shape of the population distribution. In CEA, however, the context of the analysis poses some challenges which may violate this assumption: 1) sample sizes about health economic outcomes are often limited and further reduced by considerable proportions of missing values; 2) a combination of data features that violate normality are often present in the observed data (e.g. skewness); 3) interest is not in statistical inference but rather in uncertainty assessment and quantification. The occurrence of these elements lead analysts to consider alternative distributional forms for the modelling of cost-effectiveness outcomes, in an attempt to improve the fit and convey more robust results.

missingHE allows the user to choose among a set of pre-defined parametric forms for the distributions assigned to the effectiveness and cost outcomes, which are shared across all model fitting functions in the package. The choice of the available distributions for the effetiveness and cost outcomes was informed based on the current methodological literature and may be conveyed to the desired package function by passing a specific character name to the arguments dist_e and dist_c. In particular, the set of distributions available (with associated character names) is:

effectiveness: Normal ("norm"), Gamma ("Gamma"), Beta ("Beta"), Logistic ("logis"), Weibull ("Weib"), Poisson ("Pois"), Negative Binomial ("Negbin"), Bernoulli ("Bern").
costs: Normal ("norm"), Gamma ("Gamma"), Lognormal ("lnorm").

Note that, while costs are usually expressed in monetary terms and can be modelled using continuous distributions, the level of measurement of effectiveness variables may differ according to the specific outcome which should be used to inform the choice of the relevant modelling distribution.

To illustrate how alternative outcome distributions may be assumed to conduct the health economic assessment using missingHE, we use data from the built-in data set MenSS as a toy example. The data come from a pilot randomised trial on young males at risk of sexually transmitted infections (STIs) which, among its objectives, included the cost-effectiveness assessment of the MenSS’s Safer Sex (MenSS) intervention compared to the Standard of Care (SoC). Key variables in MenSS include: Total costs and QALYs computed over the trial period (c and e); number of sexual intercourses ("sex_inst") and whether an STI was diagnosed ("sti") at follow-up; baseline values for all effectiveness variables and demographics, treatment ("trt") and site ("site") indicator.

The following code is used to fit three selection models using the function selection to conduct the cost-effectiveness assessment under a Missing At Random (MAR) assumption. The models differ in terms of the distributional assumptions specified for the two health economic variables, namely QALYs (e) and Total costs (c):

Model 1: Bivariate Normal for both e and c
Model 2: Normal for e and Gamma for c
Model 3: Beta for e and Gamma for c

Prior to fitting the model, the data in MenSS are pre-processed by: replacing the default levels for the factor variable trt ($1$ and $2$) with the actual treatment names ("SoC" and "MenSS"); add/subtract to the cost/QALY data a small constant of $0.01$ to allow estimation of Gamma and Beta distributions, whose support does not include $0$ and $1$, respectively.

> #rename trt levels
> MenSS$trt <- factor(MenSS$trt)
> levels(MenSS$trt) <- c("SoC", "MenSS")
> MenSS$e <- MenSS$e - 0.01 #ensure no ones QALYs occur
> MenSS$c <- MenSS$c + 0.01 #ensure no zero costs
> 
> #fit models with different distributions for outcomes
> #1=Normal-Normal
> sm1_nn <- selection(data = MenSS, dist_e = "norm", dist_c = "norm",
+                     model.eff = e ~ trt + u.0, model.cost = c ~ trt + e,
+                     model.me = me ~ 1, model.mc = mc ~ 1,
+                     type = "MAR", n.iter = 1000, ref = 2)
> #2=Normal-Gamma
> sm1_ng <- selection(data = MenSS, dist_e = "norm", dist_c = "gamma",
+                     model.eff = e ~ trt + u.0, model.cost = c ~ trt + e,
+                     model.me = me ~ 1, model.mc = mc ~ 1,
+                     type = "MAR", n.iter = 1000, ref = 2)
> #3=Beta-Gamma
> sm1_bg <- selection(data = MenSS, dist_e = "beta", dist_c = "gamma",
+                     model.eff = e ~ trt + u.0, model.cost = c ~ trt + e,
+                     model.me = me ~ 1, model.mc = mc ~ 1,
+                     type = "MAR", n.iter = 1000, ref = 2)

Before looking at the model results, it is generally a good idea to assess the fit of the model to the observed data to inform the decision of which model results should be trusted more. Within the Bayesian framework, alternative approaches to model selection based on performance measures are available. missingHE allows the user to compute three main types of predictive information criteria (PIC), taken from common relative measures of Bayesian model fit, using the pic function. The available criteria include: Deviance Information Criterion (DIC); Widely Applicable Information Criterion (WAIC); Leave-One-Out Information Criterion (LOOIC).

The following code shows how the results of any model fitted in missingHE may be passed to the argument x of pic to compute the desired criterion. In particular, we pass the three model result objects into pic as a list and select "looic" as value for criterion to indicate that we want the LOOIC criterion to be computed for all models, and we store the results into the new list object looic_m123. Next, we print out the computed criteria for each model by accessing the inner object looic_m123$pic.

> #estimate looic for all models based on complete cases
> looic_m123 <- pic(x = list(sm1_nn, sm1_ng, sm1_bg),
+                   criterion = "looic", cases = "cc")
> #print criteria for each model
> looic_m123$pic
#> LOOIC model 1 LOOIC model 2 LOOIC model 3 
#>      788.0907      626.9727      584.1049

The printed estimates for the criteria suggest a generally better fit for model 3 compared to model 2 and model 1, as indicated by lower LOOIC values. This in accordance with the expectation that distributions which allow to capture skewness, such as Gamma and Beta, tend to have a better fit to empirical data that show such features. Note that the criteria were generated by setting cases to "cc", that is their calculation was only based on the complete cases. Alternative cases choices are: available effectiveness cases ("ac_e"), available cost cases ("ac_c") and all cases ("all"). Although it is possible to compute the criteria based on imputed cases, it is generally recommended to assess the model fit only based on the observed cases (or after integrating out the missing values). Thus, care should be used when interpreting the PIC results directly computed based on some imputed data.

Depending on the type of criteria chosen, pic provides additional output related to the computation of the quantity. For example, the object looic_m123 is a list which stores several elements that were used in the computation of LOOIC (pic) for each model, such as estimates for the effective number of parameters (peff) as well as standard error estimates of pic_se and peff_se. These elements can be accessed and printed out in a similar way to what shown for the main LOOIC quantities. The following command can be used to access and print out peff estimates used in the calculation of LOOIC for each model.

> #print peff values
> looic_m123$peff

An alternative approach to assess the predictive ability of the model is to use the model parameters to generate replications of the data and compare them to the original data to assess the absolute predictive performance of a model with respect to the observed data. If the model is reasonable, the replications should look similar to the original data. To this purpose, missingHE allows the user to compute a series of graphical and/or numeric posterior predictive checks (PPC) using the function ppc.

As an example, the following commands are used to generated posterior density plots for a total of $20$ replications of QALYs and/or Total costs under each model, which are compared to the empirical densities of the original outcome variables. The arguments inside ppc can be used to customise the resulting plot with respect to:

type indicates the type of plot. Choices include both most commonly used graphical (e.g. density, histogram, boxplot, etc.) and numeric (e.g. summary statistics, Bayesian posterior predictive p-value, etc.) outputs.
outcome indicates the outcome, either effectiveness ("effects"), costs (costs), or both ("both").
ndisplay indicates the number of data replications to be generated to compute the desired plot/measure.
trt indicates the treatment(s). Accepted values are: a numeric value or factor level denoting the ordered position or name of the arm of interest, a character name for all arms ("all") or a character name for all data pulled across arms ("none").

> #compare densities of observed vs replicated outcome data under each model
> ppc_dens_m1 <- ppc(x = sm1_nn, type = "dens_overlay", outcome = "both",
+                    ndisplay = 20, trt = "none")
> ppc_dens_m2 <- ppc(x = sm1_ng, type = "dens_overlay", outcome = "costs",
+                    ndisplay = 20, trt = "none")
> ppc_dens_m3 <- ppc(x = sm1_bg, type = "dens_overlay", outcome = "effects",
+                    ndisplay = 20, trt = "none")

Figure 1: Posterior densities of 20 replicated data under model 1 compared to the empirical density of the original data.

Figure 2: Posterior densities of 20 replicated data under model 2 compared to the empirical density of the original data.

Figure 3: Posterior densities of 20 replicated data under model 3 compared to the empirical density of the original data.

Figure 1, Figure 2 and Figure 3 show how the predictive performance of the models changes considerably depending on the specific distributional assumptions made about the outcome variables. In particular, we see that under normality (model 1), replications for both effects and costs fail to capture the skewness observed in the empirical data distributions. Conversely, replications generated using Gamma (model 2) and Beta (model 3) distributions seem to better capture the features of the empirical cost and effectiveness data, respectively.

3 Missingness mechanism

Choice of the missingness mechanism for the outcomes is typically very difficult to justify unless strong justifications can be provided for the exclusion of alternative options. According to Rubin’s taxonomy, the types of mechanisms can be generally represented into two distinct groups: MAR and Missing Not At Random (MNAR), depending on whether the missingness probability is assumed to exclusively depend on observed quantities or not, respectively. Note that, in the literature, a third class called Missing Completely At Random (MCAR) is often identified which, however, can be also considered as a subclass of MAR where the missingness probability does not depend on any observed or unobserved variable (i.e. it is totally random). Depending on the modelling approach chosen, assumptions about the missingness mechanism can be encoded into the model in different ways. For example, selection models directly allow the specification of the missingness probability model, while pattern mixture models implicitly define the mechanism class based on the type of restrictions imposed to identify the outcome distributions in each pattern.

missingHE allows the specification of either a MAR or MNAR assumption about the effectiveness and cost missingness mechanisms in each approach through the argument type. All the three selection models fitted to the MenSS data set were estimated under a MAR assumption, where the missingness outcomes’ probabilities (model.me and model.mc) were assumed to not depend on any variable. Although this might be reasonable in some cases, it is generally more plausible that some observed variables provide some information that can help to improve estimation of the missingness probabilities.

The following code shows how estimation of selection models can be extended to include additional observed variables as predictors inside the missingness models to make the MAR assumptions more plausible. As an example, and for ease of presentation, we consider here only model 1 and include as possible predictors into model.me and model.mc the baseline utilities (u.0) and age (age) variables.

> #fit selection model 1 (Normal-Normal) under MAR conditional on u.0 and age
> sm1_nn_mar <- selection(data = MenSS, dist_e = "norm", dist_c = "norm",
+                     model.eff = e ~ trt + u.0, model.cost = c ~ trt + e,
+                     model.me = me ~ u.0 + age, model.mc = mc ~ u.0 + age,
+                     type = "MAR", n.iter = 1000, ref = 2)

Posterior summaries for all model parameters may be inspected using the print function. The argument display may also be used to select the group of parameters for which the output should be printed out, with choices being: fixed effects ("fixed"), random effects ("random"), individual-specific parameter estimates ("conditional"), point-wise log-likelihood values ("loglik"), imputed values ("mis"). For example, estimates for the “fixed effects” parameters (default) from sm1_nn_mar may be displayed by typing

> # print posterior summaries for fixed effects
> print(x = sm1_nn_mar, display = "fixed")
#>                   mean          sd         2.5%         50%       97.5% Rhat n.eff
#> alpha[1]     0.5398248  0.07094102   0.39551610   0.5391070   0.6773471 1.00  1000
#> alpha[2]     0.0325016  0.02841277  -0.02486486   0.0325016   0.0890606 1.00   860
#> alpha[3]     0.3852314  0.07528042   0.23855641   0.3870350   0.5321899 1.00  1000
#> beta[1]     80.4945787 26.25208941  27.91074292  80.7922724 129.8668494 1.00  1000
#> beta[2]     25.3348900 28.21694739 -30.71401515  25.9047230  81.2828526 1.00  1000
#> beta_f      -4.9514697 32.05597297 -66.34970542  -5.6951203  61.3508534 1.00   630
#> gamma_c[1]   1.5489849  1.18357123  -0.75169557   1.4873148   3.9371327 1.00   620
#> gamma_c[2]  -0.2586106  1.07168301  -2.40297242  -0.2096626   1.6665210 1.01   120
#> gamma_c[3]  -0.0138768  0.02043457  -0.05544140  -0.0135817   0.0276194 1.01   260
#> gamma_e[1]   1.5359047  1.24635745  -0.90351664   1.5144466   3.9129707 1.00  1000
#> gamma_e[2]  -0.2428823  1.14094669  -2.65511834  -0.1986033   1.8494462 1.00  1000
#> gamma_e[3]  -0.0138169  0.02005193  -0.05278324  -0.0144569   0.0251096 1.00   730
#> p_c          0.7122650  0.03655677   0.63505968   0.7119213   0.7827483 1.00  1000
#> p_e          0.7127179  0.03765175   0.63653714   0.7141342   0.7806097 1.00  1000
#> s_c        251.9247084 32.50808405 202.02493618 248.2900000 324.7677647 1.05   880
#> s_e          0.0912635  0.01053706   0.07376759   0.0903466   0.1148410 1.01   580
#> tau_c        0.0000165  0.00000386   0.00000948   0.0000162   0.0000245 1.05   880
#> tau_e      124.6906627 27.47081485  75.82395207 122.5114442 183.7675468 1.01   580
#> tmu_c       93.8790489 28.35282708  33.90403584  94.5242030 147.9249819 1.00  1000
#> tmu_e        0.8967404  0.01273337   0.86996886   0.8966623   0.9221490 1.00   530

The output generated by print shows for each parameter a selection of standard posterior summary measures obtained directly from JAGS including: mean, standard deviation, median, $2.5\%$ and $97.5\%$ percentiles as well as two MCMC diagnostic measures, the potential scale reduction factor (Rhat) and the number of effective sample size (n.eff). The names and number of parameters displayed reflect the structure of the outcome (model.eff and model.cost) and missingness (model.me and model.mc) models as specified in sm1_nn_mar. In particular:

alpha[1], alpha[2] and alpha[3] denote the regression coefficient in model.eff associated with the intercept, trt and u.0, respectively.
beta[1], beta[2] and beta_f denote the regression coefficient in model.cost associated with the intercept, trt and e, respectively.
gamma_c[1], gamma_c[2] and gamma_c[3] denote the regression coefficient in model.mc associated with the intercept, u.0 and age, respectively.
gamma_e[1], gamma_e[2] and gamma_e[3] denote the regression coefficient in model.me associated with the intercept, u.0 and age, respectively.
p_c and p_e denote the average missingness probabilities from model.mc and model.me, respectively.
s_c (tau_c) and s_e (tau_e) denote the outcome standard deviation (precision) for c and e, respectively.
tmu_c and tmu_e denote the average means from model.cost and model.eff, respectively.

Posterior distributions for each of these quantities may be extracted from the list object sm1_nn_mar by accessing the inner JAGS output model. For example, if different types of summaries or a visual inspection of the full posterior sample of a specific parameter is desired, e.g. p_e, we may access its posterior samples by typing

> #store posterior samples of p_e and compute some custom summaries
> p_e <- sm1_nn_mar$model_output$model$BUGSoutput$sims.list$p_e
> summary(c(p_e))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.5587  0.6885  0.7141  0.7127  0.7392  0.8102

The function coef allows to compute and display key posterior summaries for only the regression coefficients in model.eff and model.cost, following standard convention for reporting coefficient estimates for outcome regression models. The arguments random and digits may be used to indicate whether coefficient estimates of “random effects” (if present) in the outcome models should be displayed instead, and the number of digits to be displayed for all estimates.

> #print coefficient estimates from model.eff and model.cost
> coef(sm1_nn_mar, random = FALSE, digits = 2)
#> $Effects
#>             Mean   SD    QL   QU
#> (Intercept) 0.54 0.07  0.40 0.68
#> trtMenSS    0.03 0.03 -0.02 0.09
#> u.0         0.39 0.08  0.24 0.53
#> 
#> $Costs
#>              Mean    SD     QL     QU
#> (Intercept) 80.49 26.25  27.91 129.87
#> trtMenSS    25.33 28.22 -30.71  81.28
#> e           -4.95 32.06 -66.35  61.35

The output produced is split into two parts: the first related to the coefficient estimates for the effectiveness model ($effects), and the second to the coefficient estimates for the costs model ($costs). For each coefficient, the name of the associated variable alongside posterior mean, standard deviation, lower and upper bounds of a $(1-\alpha)\%$ credibility interval (CI) are displayed. By default, boundary estimates for a $95\%$ interval are displayed, but the user can also specify custom low/high boundary estimates using the optional argument prob, i.e. prob = (0.05, 0.95) for a $90\%$ CI.

4 Outcome model structure

Assumptions about the modelling structure of the effectiveness and cost outcome models can be specified across all model fitting functions through the arguments model.eff and model.cost, respectively. The arguments require a formula object which mimics standard linear regression specification e = trt + and c = trt +, where e and c and trt denote the outcome and treatment specific variable names accepted by missingHE which must always be included into the formulae. A generalised linear regression approach is used to specify the models at the location level of the outcomes, with default link functions automatically selected by missingHE according to the distributional assumptions specified in dist_e and dist_c, e.g. identity for Normal, logarithmic for Gamma and Weibull, and logit for Beta or Bernoulli. Any fully-observed variable in the data frame passed to data may be specified as additional covariates into the outcome models using a + sign. For example, the following code shows how to customise Model 1 (under MAR) to include different groups of covariates into the effectiveness and cost outcome and missingness models.

> #fit selection model 1 (Normal-Normal) with covariates into both outcome and missingness models
> sm1_nn_cov <- selection(data = MenSS, dist_e = "norm", dist_c = "norm",
+                     model.eff = e ~ trt + u.0 + age, 
+                     model.cost = c ~ trt + age + employment + e,
+                     model.me = me ~ u.0 + age, model.mc = mc ~ age + employment,
+                     type = "MAR", n.iter = 1000, ref = 2)

If appropriate, missingHE allows the user to further to customise each model structure to account for possible clustering (e.g. based on different centres) through a formula notation inspired to the one used by the lme4 package. More specifically, standard “random effects” parameters may be incorporated into each model by adding + (x | z) to the usual “fixed effects” component of the model formula, where x denotes a linear specification for the random effects variable names to be added to the model (only if x already included as fixed effects) and z denotes the clustering variable name. For example, the following code shows how random intercepts (1) based on site may be included into both outcome selection models fitted to MenSS.

> #fit selection model 1 (Normal-Normal) with covariates and random intercepts
> #into the model
> sm1_nn_cov_re <- selection(data = MenSS, dist_e = "norm", dist_c = "norm",
+                     model.eff = e ~ trt + u.0 + age + (1 | site), 
+                     model.cost = c ~ trt + age + employment + e + (1 | site),
+                     model.me = me ~ u.0 + age, model.mc = mc ~ age + employment,
+                     type = "MAR", n.iter = 1000, ref = 2)

As per usual, functions such as print and coef may be used to summarise posterior estimates for all relevant model parameters. When interest is in assessing MCMC convergence of a given model, the function diagnostic may be used to inspect graphical diagnostics for each family of model parameters. For example, the following code shows how autocorrelation plots for all regression coefficients in model.eff based on the formula stored in sm1_nn_cov.

Checking convergence using the diagnostic function for a family of model parameters estimated in missingHE, for example through inspection of the autocorrelation plots.

Figure 4: Checking convergence using the diagnostic function for a family of model parameters estimated in missingHE, for example through inspection of the autocorrelation plots.

Graphs are displayed for each MCMC chain (default$=2$) and model parameters belonging to the specified family (alpha), while the argument type is used to select the specific type of diagnostic plot to be compute based on a set of available names accepted by missingHE. Several types of graphs are available in missingHE, among which, some of the most popular plots and associated names include: trace ("traceplot"), density ("denplot"), histogram ("histogram), autocorrelation ("acf"), and running means ("running") plots.

The accepted names for the family of parameter, specified param, are related to the names assigned by missingHE to the model parameters in the JAGS text file, which can be displayed using the print function. Note that many parameter names are shared across all model fitting functions in missingHE. For example, the names "alpha" and "beta" are accepted by all functions and respectively denote all fixed effects regression coefficients in model.eff and model.cost, while "random.alpha" and "random.beta" denote the corresponding random effects parameters (if included). Similarly, "mu_e", "sd.e", "mu_c" and "sd.c" denote the effectiveness and cost outcome mean and standard deviation parameters. Some of family names are specific, or have a different interpretation, according to the modelling approach and function used. A quick summary of the most important family names unique to each approach in missingHE is the following:

selection: "gamma.e", "gamma.c", "delta.e" and "delta.c" denote the logistic regression fixed effects coefficients and the MNAR parameters obtained from model.me and model.mc, respectively. Corresponding random effects parameters (if included) may also be indicated with the family names "random.gamma.e", "random.gamma.c", "random.delta.e" and "random.gamma.c". Further, the average probabilities from both missingness models are denoted with "p.e" and "p.c".
pattern: "delta.e" and "delta.c" denote the sensitivity parameters included in each missingness pattern in combination with the type of identifying restrictions imposed, while "p.p" denotes the pattern-specific probabilities. Note that, unlike any other approach, most parameter estimates from pattern mixture models are estimated and disaplyed in diagnostic separately for each missingness pattern.
hurdle: "gamma.e" and "gamma.c" denote the logistic regression fixed effects coefficients obtained from model.se and model.sc, respectively. Corresponding random effects parameters (if included) may also be indicated with the family names "random.gamma.e" and "random.gamma.c". Further, the average probabilities from both structural value models (if specified) are denoted with "p.e" and "p.c".
lmdm: "gamma.e", "gamma.c", "delta.e" and "delta.c" denote the multinomial regression fixed effects coefficients and MNAR parameters obtained from model.me and model.mc, respectively. Corresponding random effects parameters (if included) may also be indicated with the family names "random.gamma.e", "random.gamma.c", "random.delta.e" and "random.gamma.c". Further, the average probabilities associated with all three types of missingness (i.e. completers, intermittent and dropout) are denoted with "p.e" and "p.c". Note that, since lmdm fits longitudinal models, estimates for all parameters from model.eff and model.cost are estimated and displayed in diagnostic separately for each time point.

5 Prior distributions

missingHE fits each modelling approach under weakly-informative prior distribution choices for all model parameters based on common practice and assuming a general lack of prior knowledge about the parameters. Examples include assuming Normal distributions centred at $0$ with generally low precision values (i.e. $1\times10^{6}$) for all regression coefficients, and Half-Cauchy distributions with a scale value of $2.5$. However, when desired and if prior information on some parameters is available, the user may overwrite missingHE defaul prior choices using the argument prior included in each modelling function. It is important that user-specific prior choices are passed to prior in a list form containing information about the desired parameters which need to be identified according to the type of model under consideration. For example, the following command may be used to update the priors about the effectiveness and cost regression coefficients and standard deviations when fitting a selection model using selection.

> #update priors on outcome regression coefficients and standard deviations
> myprior <- list("beta.prior" = c("norm", 0, 0.01), 
+                 "beta_f.prior" = c("norm", 0, 0.001),
+                 "alpha.prior" = c("norm", 0, 0.01),
+                 "sigma.prior.e" = c("unif", 0, 5),
+                 "sigma.prior.c" = c("unif", 0, 1000))
> #update model with new priors
> sm2_nn_cov <- selection(data = MenSS, dist_e = "norm", dist_c = "norm",
+                     model.eff = e ~ trt + u.0 + age, 
+                     model.cost = c ~ trt + age + employment + e,
+                     model.me = me ~ u.0 + age, model.mc = mc ~ age + employment,
+                     type = "MAR", n.iter = 1000, ref = 2,
+                     prior = myprior)

The object myprior must be a list whose elements should take names accepted by missingHE denoting the specific parameters whose priors need to be overwritten. For example the character names "beta.prior", "beta_f.prior and "alpha.prior" indicate that the default priors for all covariate coefficients in model.cost, the coefficient associated with e in model.cost and all covariate coefficients in model.eff need to be replaced. The content of these list elements are recognised by missingHE as providing the information necessary to write the updated priors for these parameters and must be provided in the form of vectors. The first element of each vector tells the name of the specific distribution assumed for the prior, while the successive elements inside each vector denote the canonical parameters indexing the chosen distribution. For example, for the first groups of parameters "beta", myprior instructs missingHE to replace their default priors with Normal distributions centred at $0$ with a precision of $0.01$. Similarly, myprior instructs missingHE to specify Uniform priors on the standard deviations with differing ranges for the effectiveness and cost outcomes, through the information contained in the "sigma.prior.e" and "sigma.prior.c" elements.

Note that specific character names for the custom prior list elements as well as specific names for the prior distributions and the total number and possible values of the parameters indexing the priors must be used. missingHE allows the user to specify custom priors for parameters that are either shared across all types of model fitting functions, such as "beta.prior" and "alpha.prior" for regression coefficients in model.eff and model.cost, or specific to given functions, such as "delta.prior.e" and "delta.prior.c" for MNAR parameters in model.me and model.mc. The available distribution forms, their character names and parameterisation used by missingHE match those available from the JAGS manual (version 4.3.0). With regard to the prior list elements, some of the most important parameter names that are specific to certain modelling approaches, with associated parameter interpretation, are: