Causal Effect Estimation in Graphical Models with Unmeasured Variables

The flexCausal R package enables estimation of Average Causal Effects (ACE) for a broad class of graphical models that satisfy the treatment primal fixability criterion. Briefly, the package accepts a dataset and a causal graph—defined through nodes, directed edges, and bi-directed edges—as input and provides causal effect estimates using robust influence-function-based estimators developed in this paper.

If you find this package useful, please consider cite: this paper

@article{guo2024average,
  title={Average Causal Effect Estimation in DAGs with Hidden Variables: Extensions of Back-Door and Front-Door Criteria},
  author={Guo, Anna and Nabi, Razieh},
  journal={arXiv preprint arXiv:2409.03962},
  year={2024}
}

This paper is highly relevant as well, which offers estimation strategy for ACE under the front-door model, a special case of the graphical models considered in flexCausal.

@article{guo2023targeted,
  title={Flexible Nonparametric Inference for Causal Effects under the Front-Door Model},
  author={Guo, Anna and Benkeser, David and Nabi, Razieh},
  journal={arXiv preprint arXiv:2312.10234},
  year={2023}
}

The package asks for the following inputs from the user:

The package outputs:

Here’s a schematic view of what flexCausal is capable of and how it works:

# Installation To install, run the following code in terminal:

# install the devtools package first if it's not yet installed
devtools::install_github("annaguo-bios/flexCausal")

The source code for flexCausal package is available on GitHub at flexCausal.

1 A Brief Introduction to ADMGs

The ADMG is one of the major input to the flexCausal package. It is a projection of a Directed Acyclic Graph (DAG) with unmeasured variables into only observed variables. Let \(G(O \cup U)\) denote a DAG with observed variables \(O\) and unmeasured variables \(U\). The ADMG is a projection of \(G(O \cup U)\) onto \(O\) only, denoted as \(G(O)\). The projection is guided by the following rules:

  1. \(O_i \ \textcolor{blue}{\rightarrow} \ O_j\) is included in the ADMG \(G(O)\) if \(O_i \ \textcolor{blue}{\rightarrow} \ O_j\) exists in \(G(O \cup U)\) or if a directed path from \(O_i\) to \(O_j\) exists that passes through only unmeasured variables in \(G(O \cup U)\).
  2. \(V_i \ \textcolor{red}{\leftrightarrow} \ V_j\) is included in the ADMG \(G(O)\) if a collider-free path, like \(V_i \ \textcolor{blue}{\leftarrow} \cdots \textcolor{blue}{\rightarrow} \ V_j\), exists in \(G(O \cup U)\) where all the intermediate variables belong to \(U\).

See Examples (a) and (b), where the DAGs with unmeasured variables on the left are projected onto their corresponding ADMGs on the right:

In all the following discussions, we will use the ADMG in example (a) above as a running example. The packages comes with simulated datasets data_example_a and data_example_b that are generated from the ADMGs in examples (a) and (b), respectively. Let’s take a look at the first few rows of data_example_a:

library(flexCausal) # load the package
data(data_example_a)
head(data_example_a) # take a glance of the data, which is a simulated dataset under above figure (a).
##            X          U A       M.1        M.2          L         Y
## 1 0.98890930  3.4036578 0 0.5423635 -1.6361458  0.7632388  3.257711
## 2 0.39774545 -0.7055857 0 2.4330827 -0.6538274  3.9498004  5.338658
## 3 0.11569778  1.0320105 1 4.5009622  2.0672577 11.4239744 19.819490
## 4 0.06974868  1.6994103 1 2.8610542 -1.1488686  5.0942460 10.803918
## 5 0.24374939  2.9995114 1 2.6677580 -1.1273291  3.2955105  8.205794
## 6 0.79201043  2.9700555 1 3.3190450  3.5674934 10.1107529 21.442111

Note that the \(M\) variable in the dataset is a multivariate variable, consisting of two components, \(M.1\) and \(M.2\). To input the ADMG into the flexCausal package, users need to specify the vertices, directed edges, and bi-directed edges in the ADMG, along with the components of any multivariate variables.

For example, to input the ADMG in example (a) above that aligns with the simulated dataset, we would create a graph object with the make.graph() function:

# create a graph object for the ADMG in example (a)
graph_a <- make.graph(vertices=c('A','M','L','Y','X'), # specify the vertices
                bi_edges=list(c('A','Y')), # specify the bi-directed edges
                di_edges=list(c('X','A'), c('X','M'), c('X','L'),c('X','Y'), c('M','Y'), c('A','M'), c('A','L'), c('M','L'), c('L','Y')), # specify the directed edges, with each pair of variables indicating an directed edge from the first variable to the second. For example, c('X', 'A') represents a directed edge from X to A.
                multivariate.variables = list(M=c('M.1','M.2'))) # specify the components of the multivariate variable M

With this graph object in hand, we can conveniently explore various properties of the ADMG using functions provided in the package. These functions include topological ordering, genealogical relations, and causal effect identifiability, and etc. For a detailed discussion, see Functions for learning the properties of ADMG.

As a quick example, we can obtain the adjacency matrix of this ADMG with the f.adj_matrix() function:

f.adj_matrix(graph_a) # get the adjacency matrix of the ADMG in example (a)
##   A M L Y X
## A 0 0 0 0 1
## M 1 0 0 0 1
## L 1 1 0 0 1
## Y 0 1 1 0 1
## X 0 0 0 0 0

2 Quick Start on Estimation

The main function in this package is estADMG(), which estimates the Average Causal Effect (ACE) using both a one-step corrected plug-in estimator and a TMLE estimator. To get a sense of how to use this package, we provide a quick example below. We will use the ADMG in example (a) above to estimate the ACE of treatment \(A\) on outcome \(Y\) using a simulated dataset data_example_a. We can directly use the graph_a object created above as the input to the function.

est <- estADMG(a=c(1,0),
                data=data_example_a, 
                graph=graph_a,
                treatment='A', outcome='Y')
## The treatment is not fixable but is primal fixable. Estimation provided via extended front-door functional.

## Onestep estimated ACE: 1.96; 95% CI: (1.3, 2.62) 
## TMLE estimated ACE: 1.96; 95% CI: (1.29, 2.62)

## The graph is nonparametrically saturated. Results from the one-step estimator and TMLE are provided, which are in theory the most efficient estimators.

The code above estimates the ACE of treatment \(A\) on outcome \(Y\), defined as \(E(Y^1)-E(Y^0)\), using the data data_example_a generated based on Figure (a). The function estADMG() takes the following arguments:

Alternatively, instead of providing a graph object, users can directly specify the vertices, directed edges, and bi-directed edges within the estADMG() function, as shown below:

est <- estADMG(a=c(1,0),data=data_example_a, 
                vertices=c('A','M','L','Y','X'), # specify the vertices
                bi_edges=list(c('A','Y')), # specify the bi-directed edges
                di_edges=list(c('X','A'), c('X','M'), c('X','L'),c('X','Y'), c('M','Y'), c('A','M'), c('A','L'), c('M','L'), c('L','Y')), # specify the directed edges
                multivariate.variables = list(M=c('M.1','M.2')), # specify the components of the multivariate variable M
                treatment='A', outcome='Y')
## The treatment is not fixable but is primal fixable. Estimation provided via extended front-door functional.

## Onestep estimated ACE: 1.96; 95% CI: (1.3, 2.62) 
## TMLE estimated ACE: 1.96; 95% CI: (1.29, 2.62)

## The graph is nonparametrically saturated. Results from the one-step estimator and TMLE are provided, which are in theory the most efficient estimators.

3 Details on Estimation via Onestep Estimator and TMLE

The package constructs the EIF based Onestep estimator and TMLE for ACE through break down the EIF into several nuisance parameters, which falls into two categories: the sequential regressions and density ratios. The figure above illustrates the decomposition of the EIF into the nuisance parameters using the ADMG in example (a) above.

To define these nuisance parameters, it is essential to learn the properties of the ADMG. Specifically, we require three definitions:
1. A topological ordering \(\tau\) for variables in the ADMG. With a graph oject, \(\tau\) can be obtained using the f.top_order() function. See Functions for learning the properties of ADMG for more details.

f.top_order(graph_a)
## [1] "X" "A" "M" "L" "Y"
  1. A set \(\mathcal{L}\) that contains all the variables that is post the treatment variable according to \(\tau\) and is bidirectedly connected to the treatment variable.

  2. A set \(\mathcal{M}\) that contains all the variables that is post the treatment variable according to \(\tau\) and is not in \(\mathcal{L}\).

The sequential regressions are defined for the treatment variable \(A\) the outcome variable \(Y\) and all the variables between \(A\) and \(Y\) in \(\tau\). Introducing set \(\mathcal{L}\) and \(\mathcal{M}\) is important because \(A\) at those sequential regressions is evaluated at specific level determined by which set the corresponding variable belongs to.

The density ratios are defined for the treatment variable \(A\) and all the variables between \(A\) and \(Y\) in \(\tau\). Each density ratio is defined as the ratio of conditional density of a variable. Evaluation of \(A\) at these density ratios also depends on which set the corresponding variable belongs to.

3.1 Sequential Regressions

For the sequential regressions, we offer three options for estimation: (1) via linear or logistic regression, (2) via , and (3) via together with cross-fitting.

Here we offer a table summary of available methods for sequential regressions.

Sequential Regressions Estimation Method Details
Estimation Method Arguments Explanations
Linear or logistic regressions
formulaY Formula for outcome regression
formulaA Formula for treatment regression
linkY_binary Link function for binary outcome Y in logistic regression
linkA Link function for binary outcome A in logistic regression
SuperLearner
superlearner.seq Whether to use SuperLearner for variables between \(A\) and \(Y\)
superlearner.Y Whether to use SuperLearner for outcome Y
superlearner.A Whether to use SuperLearner for outcome A
library.seq Library of learners for variables between \(A\) and \(Y\)
library.Y Library of learners for \(Y\)
library.A Library of learners for \(A\)
SuperLearner with cross-fitting
crossfit Whether to use cross-fitting along with SuperLearner
K Number of folds in cross-fitting
library.xxx Library of SuperLearner is still specified via library.seq, library.Y, library.A, respectively

The code below is an example of adopting SuperLearner with cross-fitting:

library(SuperLearner)

est <- estADMG(a=c(1,0),
                data=data_example_a, 
                graph = graph_a,
                treatment='A', outcome='Y',
                lib.seq = c("SL.glm", "SL.earth", "SL.ranger", "SL.mean"),
                lib.Y = c("SL.glm", "SL.earth", "SL.ranger", "SL.mean"),
                lib.A = c("SL.glm", "SL.earth", "SL.ranger", "SL.mean"),
                crossfit = TRUE,
                K=5)

3.2 Density Ratios

For the density ratios, we provide three options for estimation: (1) via direct estimation using the densratio package, (2) via Bayes’ rule, and (3) via assuming the density follows a Normal distribution. Note that these methods only apply to the variables between \(A\) and \(Y\) in \(\tau\). It doesn’t apply to \(A\) since the treatment regression is already estimated in the sequential regressions.

\[ \frac{p(M|a_0,X)}{p(M|a_1,X)} = \frac{p(a_0|M,X)p(M|X)}{p(a_1|M,X)p(M|X)} = \frac{p(a_0|M,X)}{p(a_1|M,X)}. \]

The regressions involved in this Bayes’ formula are estimated following the arguments discussed for sequential regressions.

Here we offer a table summary of available methods for density ratios.

Density Ratios Estimation Method Details
Estimation Method Arguments Explanations
densratio package
ratio.method.L="densratio" Use densratio package to estimate the density ratio for variables in \(\mathcal{L}\)
ratio.method.M="densratio" Use densratio package to estimate the density ratio for variables in \(\mathcal{M}\)
Bayes’ rule
ratio.method.L="bayes" Apply the Bayes’ rule to estimate the density ratio for variables in \(\mathcal{L}\)
ratio.method.M="bayes" Apply the Bayes’ rule to estimate the density ratio for variables in\(\mathcal{M}\)
Normal distribution
ratio.method.L="dnorm" dnorm.formula.L Assuming Normal distributions to estimate the density ratio for variables in \(\mathcal{L}\). The mean of the Normal distributions are estimated via fitting regressions following formula specified in dnorm.formula.L
ratio.method.M="dnorm" dnorm.formula.M Assuming Normal distributions to estimate the density ratio for variables in \(\mathcal{M}\). The mean of the Normal distributions are estimated via fitting regressions following formula specified in dnorm.formula.M

The code below is an example of using the dnorm method for the density ratio estimation for variables in \(M\backslash Y\):

est <- estADMG(a=c(1,0),
                data=data_example_a, 
                graph = graph_a,
                treatment='A', outcome='Y',
                ratio.method.M = "dnorm")

3.3 The Onestep estimator

To construct the Onestep estimator, the estADMG() function estimates all the sequential regressions and density ratios discussed above. These nuisance estimates are then used to construct an EIF estimate as well as the EIF-based Onestep estimator for the target parameter. The EIF estimate is further used to construct the confidence interval for the Onestep estimator.

The function estADMG() provides Onestep estimator by default.

3.4 The TMLE

To construct TMLE, we update the estimated nuisance parameters via a targeting procedure such that the corresponding part of the EIF for each variable is sufficiently small. Sometimes, the targeting procedure requires iterative updates between nuisance parameters. The function estADMG() provides several arguments to control for this iterative process:

4 Output

As an example, we use estADMG() to estimate the average counterfactual outcome \(E(Y^1)\). The output is described as follows

est <- estADMG(a=1,
                data=data_example_a, 
                graph = graph_a,
                treatment='A', outcome='Y')

# TMLE and Onestep estimator
est$TMLE # a list contains the estimation result from TMLE estimator
est$Onestep # a list contains the estimation result from Onestep estimator

# For either the TMLE or Onestep estimator, the output is a list that contains the following elements:
est$TMLE$EYa # the estimated average counterfactual outcome
est$TMLE$lower.ci # the lower bound of the 95% confidence interval
est$TMLE$upper.ci # the upper bound of the 95% confidence interval
est$TMLE$EIF # the estimated efficient influence function
est$TMLE.Ya$EIF.Y # the estimated efficient influence function at the tangent space associated with the outcome
est$TMLE.Ya$EIF.A # the estimated efficient influence function at the tangent space associated with the treatment
est$TMLE.Ya$EIF.v # the estimated efficient influence function at the tangent space associated with the rest of the variables
est$TMLE.Ya$p.a1.mpA # the estimated propensity score for treatment
est$TMLE.Ya$mu.next.A # the estimated sequential regression associated with variable that comes right after the treatment according to the topological ordering of the ADMG
est$TMLE.Ya$EDstar # mean of the estimated efficient influence function
est$TMLE.Ya$iter # iterations take for TMLE estimator to converge
est$TMLE.Ya$EDstar.record # the mean of the estimated efficient influence function at each iteration

5 Functions for learning the properties of ADMG

Apart from the estADMG() for causal effec estimation, we also provide functions for learning the properties of ADMG. The functions are described as follows:

graph_a <- make.graph(vertices=c('A','M','L','Y','X'), # specify the vertices
                bi_edges=list(c('A','Y')), # specify the bi-directed edges
                di_edges=list(c('X','A'), c('X','M'), c('X','L'),c('X','Y'), c('M','Y'), c('A','M'), c('A','L'), c('M','L'), c('L','Y')), # specify the directed edges, with each pair of variables indicating an directed edge from the first variable to the second. For example, c('X', 'A') represents a directed edge from X to A.
                multivariate.variables = list(M=c('M.1','M.2'))) # specify the components of the multivariate variable M
f.adj_matrix(graph_a)
f.top_order(graph_a)
f.parents(graph_a, 'Y')
f.children(graph_a, 'A')
f.descendants(graph_a, 'A')
f.district(graph_a, 'A')
f.markov_blanket(graph_a, 'A')
f.markov_pillow(graph_a, 'A')
is.fix(graph_a, 'A')
is.p.fix(graph_a, 'A')

If the treatment is primal fixable, then the average causal effect of the treatment on any choice of the outcome in the given graph is always identified.

is.np.saturated(graph_a)

A graph being nonparametrically saturated means that the graph implies NO equality constraints on the observed data distribution.

is.mb.shielded(graph_a)

A graph being mb-shielded means that the graph only implies ordinary equality constraints on the observed data distribution.