Divergence Tests of Goodness of Fit

Association graphs can be used to suggest structural models of multivariate dependence. The function div_gof() provides divergence-based goodness-of-fit tests for several such hypotheses, including uniformity, pairwise independence, conditional independence, and nested model comparisons.

library(netropy)

Example:

We use the law firm network data included in the package.

We first edit the node attributes and transform them into dyad variables.

The first rows of the dyad-level data are:

head(dyad_var)
##   status gender office years age practice lawschool cowork advice friend
## 1      3      3      0     8   8        1         0      0      3      2
## 2      3      3      3     5   8        3         0      0      0      0
## 3      3      3      3     5   8        2         0      0      1      0
## 4      3      3      0     8   8        1         6      0      1      2
## 5      3      3      0     8   8        0         6      0      1      1
## 6      3      3      1     7   8        1         6      0      1      1

Conditional Independence

A model of substantive interest is whether friendship and co-working are conditionally independent given advice:

\[ \texttt{friend} \perp \texttt{cowork} \mid \texttt{advice}. \]

This can be tested by specifying var_cond:

div_gof(
  dat = dyad_var,
  var1 = "friend",
  var2 = "cowork",
  var_cond = "advice"
)
##                                        test     D   chi2 df critical_value
## 1 friend independent of cowork given advice 0.003 11.959 12         21.798
##        decision
## 1 cannot reject

Pairwise Indpendence

If var_cond is omitted, div_gof() tests ordinary pairwise independence:

\[ \texttt{friend} \perp \texttt{cowork}. \]

div_gof(
  dat = dyad_var,
  var1 = "friend",
  var2 = "cowork"
)
##                           test     D    chi2 df critical_value decision
## 1 friend independent of cowork 0.041 140.035  3          7.899   reject

Uniformity

The function can also test whether a single variable is uniformly distributed across its observed categories. This is specified using var_uniform:

div_gof(
  dat = dyad_var,
  var_uniform = "friend"
)
##                 test     D     chi2 df critical_value decision
## 1 uniformity: friend 1.119 3853.341  3          7.899   reject

Nested Model Comparison

Reduced models can also be compared to the saturated empirical model. The saturated model is represented by divergence D = 0 and degrees of freedom df = 0.

m_full <- list(D = 0, df = 0)

m_reduced <- div_gof(
  dat = dyad_var,
  var1 = "friend",
  var2 = "cowork"
)

div_gof(
  dat = dyad_var,
  model_full = m_full,
  model_reduced = list(D = m_reduced$D, df = m_reduced$df)
)
##                      test     D    chi2 df critical_value decision
## 1 nested model comparison 0.041 141.243  3          7.899   reject

Similarly, we can compare a conditional independence model to the saturated empirical model:

m_reduced <- div_gof(
  dat = dyad_var,
  var1 = "friend",
  var2 = "cowork",
  var_cond = "advice"
)

div_gof(
  dat = dyad_var,
  model_full = m_full,
  model_reduced = list(D = m_reduced$D, df = m_reduced$df)
)
##                      test     D   chi2 df critical_value      decision
## 1 nested model comparison 0.003 10.335 12         21.798 cannot reject

References

Frank, O., & Shafie, T. (2016). Multivariate entropy analysis of network data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 129(1), 45-63. link