Title: | Graphical Models Estimation from Multiple Sources |
Version: | 2.0.2 |
Description: | Estimates networks of conditional dependencies (Gaussian graphical models) from multiple classes of data (similar but not exactly, i.e. measurements on different equipment, in different locations or for various sub-types). Package also allows to generate simulation data and evaluate the performance. Implementation of the method described in Angelini, De Canditiis and Plaksienko (2022) <doi:10.3390/math10213983>. |
Depends: | R (≥ 3.6.0) |
Imports: | Matrix, matrixcalc, MASS, SMUT, igraph, parallel, purrr |
URL: | https://github.com/annaplaksienko/jewel |
License: | GPL-2 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2024-05-21 21:48:46 UTC; annapla |
Author: | Anna Plaksienko |
Maintainer: | Anna Plaksienko <anna@plaxienko.com> |
Repository: | CRAN |
Date/Publication: | 2024-05-22 21:20:07 UTC |
Adding zero diagonal to a matrix
Description
Function adds zero diagonal to (p-1)
by p
matrix and returns p
by p
matrix
Usage
addZeroDiagonal(M)
Arguments
M |
a matrix to which you need to add zero diagonal |
Construct weights for _jewel_ minimization problem from prior information on vertices degrees.
Description
Function takes a numerical vector of vertices degrees and constructs weights with the rule W_ij = 1 / sqrt(d_i * d_j)
and then the whole matrix is normilized by the maximum.
Usage
constructWeights(d, K = NULL)
Arguments
d |
either one numerical vector or a list of |
K |
number of classes (i.e. datasets, i.e. desired graphs). By default it is length(d).
In length(d) = 1, |
Value
W - a list of K
numeric matrices of the size p
by p
Examples
{
K <- 3
p <- 50
n <- 20
data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE)
G_list_true <- data$Graphs
true_degrees <- rowSums(G_list_true[[1]])
cut <- sort(true_degrees, decreasing = TRUE)[ceiling(p * 0.03)]
apriori_hubs <- ifelse(true_degrees >= cut, 10, 1)
W <- constructWeights(apriori_hubs, K = K)
}
Evaluation of graph estimation method's performance if the true graph is known.
Description
Function compares adjacency matrices of the true and estimated simple graphs and calculates the number of true positives (correctly estimated edges), true negatives (correctly estimated absence of edges), false positives (edges present in the estimator but not in the true graph) and false negatives (failure to identify an edge).
Usage
evaluatePerformance(G, G_hat)
Arguments
G |
true graph's adjacency matrix. |
G_hat |
estimated graph's adjacency matrix. Must have the same dimensions as |
Value
performance - a numeric vector of length 4 with TP, TN, FP, FN.
Examples
{
K <- 3
p <- 50
n <- 20
data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE)
G_common_true <- data$CommonGraph
X <- data$Data
res <- jewel(X, lambda1 = 0.25)
G_common_est <- res$CommonG
evaluatePerformance(G = G_common_true, G_hat = G_common_est)
}
Generate a set of scale-free graphs and corresponding datasets (using the graphs as their Gaussian graphical models)
Description
Function first generates K
scale-free graphs with p
vertices. They have the same order and degree distribution and share most of the edges, but some edges may vary (user can control how many).
Function then generates corresponding precision and covariance matrices, all of the size p
by p
(see the paper for the details of the procedure).
Then for each l
-th element of vector n
it generates K
data matrices, each of the size n_l
by p
,
i.e., for the same underlying graphs we can generate several sets of K
datasets with different sample sizes.
Usage
generateData_rewire(
K,
p,
n,
power = 1,
m = 1,
perc = 0.05,
int = NULL,
ncores = NULL,
makePlot = TRUE,
verbose = TRUE
)
Arguments
K |
number of graphs/data matrices. |
p |
number of nodes in the true graphs. |
n |
a numerical vector of the sample sizes for each desired set of
|
power |
a number, power of preferential attachment for the Barabasi-Albert algorithm for the generation of the scale-free graph. Bigger number means more connected hubs. The default value is 1. |
m |
number of edges to add at each step of Barabasi-Albert algorithm for generation of the scale-free graph. The default value is 1. |
perc |
a number, tuning parameter for the difference between graphs.
Number of trials to perform in the rewiring procedure of the first graph is
|
int |
a vector of two numbers, |
ncores |
number of cores to use in parallel data generation.
If |
makePlot |
If makePlot = FALSE, plotting of the generated graphs is disabled. The default value is TRUE. |
verbose |
If verbose = FALSE, tracing information printing is disabled. The default value is TRUE. |
Value
The following list is returned
-
Graphs
– a list of adjacency matrices of theK
generated graphs. -
CommomGraph
- a matrix, common part (intersection) of theK
generated graphs. -
Data
- a list of lists, for each sample size of the input vectorn
one obtainsK
data matrices, each of the sizen_l
byp
. -
Sigma
- a list ofK
covariance matrices of the sizep
byp
.
Examples
data <- generateData_rewire(K = 3, p = 50, n = 20, ncores = 1, verbose = FALSE)
Estimate Gaussian graphical models from multiple datasets
Description
This function estimates Gaussian graphical models (i.e. networks of conditional dependencies, direct connections between variables) given multiple datasets. We assume that datasets contain measurements of the same variables collected under different conditions (different equipment, locations, even sub-types of disease).
Usage
jewel(
X,
lambda1,
lambda2 = NULL,
Theta = NULL,
W = NULL,
tol = 0.01,
maxIter = 10000,
stability = FALSE,
stability_nsubsets = 25,
stability_frac = 0.8,
verbose = TRUE
)
Arguments
X |
a list of |
lambda1 |
a number, first regularization parameter (of the common penalty). |
lambda2 |
an optional number, second regularization parameter
(of the class-specific penalty). If NULL, set to |
Theta |
an optional list of |
W |
an optional list of |
tol |
an optional number, convergence threshold controlling the relative error between iterations. The default value is 0.01. |
maxIter |
an optional number, maximum allowed number of iterations. The default value is 10 000. |
stability |
if stability = TRUE, stability selection procedure to reduce
the number of false positives will be applied. |
stability_nsubsets |
an optional number, how many times to subsample datasets and apply __jewel__ for stability selection procedure. The default value is 25. |
stability_frac |
an optional number, in what proportion of the stability results on subsampled data an edge has to be present to be included into the final estimate. The default value is 0.8. |
verbose |
if verbose = FALSE, tracing information printing is disabled. The default value is TRUE. |
Value
The following list is returned
-
CommonG
- an adjacency matrix of the common estimated graph (intersection ofK
estimated graphs). -
G_list
- a list ofK
adjacency matrices for each estimated graph. -
Theta
- a list ofK
estimated covariance matrices (when stability selection is disabled). -
BIC
– a number, value of Bayesian information criterion for resulting graphs (when stability selection is disabled).
Examples
{
K <- 3
p <- 50
n <- 20
data <- generateData_rewire(K = K, p = p, n = n, ncores = 1, verbose = FALSE)
G_list_true <- data$Graphs
X <- data$Data
true_degrees <- rowSums(G_list_true[[1]])
cut <- sort(true_degrees, decreasing = TRUE)[ceiling(p * 0.03)]
apriori_hubs <- ifelse(true_degrees >= cut, 10, 1)
W <- constructWeights(apriori_hubs, K = K)
res <- jewel(X, lambda1 = 0.25, W = W, verbose = FALSE)
}
Estimate Gaussian graphical models from multiple datasets
Description
This function estimates Gaussian graphical models (i.e. networks of conditional dependencies, direct connections between variables) given several datasets. We assume that datasets contain measurements of the same variables collected under different conditions (different equipment, locations, even sub-types of disease).
Usage
jewel_inner(
X,
lambda1,
lambda2 = NULL,
Theta = NULL,
W = NULL,
tol = 0.01,
maxIter = 10000,
verbose = TRUE
)
Arguments
X |
a list of |
lambda1 |
a number, first regularization parameter (of the common penalty). |
lambda2 |
an optional number, second regularization parameter (of the class-specific penalty). If NULL, set to |
Theta |
an optional list of |
W |
an optional list of |
tol |
an optional number, convergence threshold controlling the relative error between iterations. The default value is 0.01. |
maxIter |
an optional number, maximum allowed number of iterations. The default value is 10 000. |
verbose |
if verbose = FALSE, tracing information printing is disabled. The default value is TRUE. |
Value
The following list is returned
-
CommonG
- an adjacency matrix of the common estimated graph (intersection ofK
estimated graphs). -
G_list
- a list ofK
adjacency matrices for each estimated graph. -
Theta
- a list ofK
estimated covariance matrices. -
BIC
– a number, value of Bayesian information criterion for resulting graphs.
Removing diagonal from a matrix
Description
Function removes a diagonal from a square p
by p
matrix
and returns (p-1)
by p
matrix
Usage
removeDiagonal(M)
Arguments
M |
a matrix in which you need to remove a diagonal |