---
title: "Extending causalDisco with new algorithms"
vignette: >
  %\VignetteIndexEntry{Extending causalDisco with new algorithms}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
knitr:
  opts_chunk:
    collapse: true
    comment: "#>"
    fig.align: "center"
---

```{r}
#| label: setup
library(causalDisco)
```

This vignette illustrates how to add a new algorithm to causalDisco from
bnlearn, pcalg and Tetrad. Both bnlearn and pcalg are R packages and therefore
share the same interface, while Tetrad is a Java library and requires a slightly
different integration pattern.

# bnlearn and pcalg

Suppose we want to add the Hybrid Parents and Children (HPC) algorithm from
bnlearn. We can integrate it into causalDisco by creating a new function
`hpc()`.

HPC is a constraint-based algorithm, so it requires a conditional independence
test and a significance level alpha. It returns a PDAG, so we set the
`graph_class` attribute to `"PDAG"`.

The function takes an `engine` argument, which specifies which implementation of
the algorithm to use. Even though we only add HPC from bnlearn in this example,
we set up the structure to allow for multiple engines in the future. The `...`
argument allows us to pass additional arguments to the underlying algorithm and
test.

The code for the `hpc()` function is as follows:

```{r}
#| label: hpc function
hpc <- function(
  engine = c("bnlearn"),
  test,
  alpha = 0.05,
  ...
) {
  engine <- match.arg(engine)

  make_method(
    method_name = "hpc",
    engine = engine,
    engine_fns = list(
      bnlearn = function(...) make_runner(engine = "bnlearn", alg = "hpc", ...)
    ),
    test = test,
    alpha = alpha,
    graph_class = "PDAG",
    ...
  )
}
```

We see that the `hpc()` function is very simple, and all the logic is handled by
the `make_method()` and `make_runner()` functions.

Once defined, the HPC algorithm can be used like any other method in
causalDisco. We first construct the method using `hpc()`, and then pass it to
`disco()`. Here we demonstrate using the included `tpc_example` dataset:

```{r}
#| label: hpc example
data(tpc_example)
hpc_bnlearn <- hpc(engine = "bnlearn", test = "fisher_z", alpha = 0.05)
hpc_bnlearn_result <- disco(tpc_example, hpc_bnlearn)
plot(hpc_bnlearn_result)
```

To implement a **score-based** algorithm instead, the structure would remain the
same. The main difference is that the method would accept a `score` argument
rather than `test` and `alpha`. A **hybrid** algorithm algorithm would accept
`test`, `alpha`, and `score` arguments.

To implement an algorithm from pcalg rather than bnlearn, we would follow the
same structure but change all instances of `"bnlearn"` to `"pcalg"`.

# Tetrad

We will illustrate how to make a custom version (which actually does the same as
our implementation, just with some defensive programming omitted) of the BOSS
algorithm from Tetrad. Tetrad algorithms follow a slightly different integration
pattern because Tetrad is a Java library rather than an R package.

To add a new Tetrad algorithm, first register it with
`register_tetrad_algorithm()`. This function requires:

- The algorithm name.
- A setup function that configures a `TetradSearch` object to execute the
  algorithm.

The setup function receives the TetradSearch object as its first argument, along
with any additional parameters passed via `...` when the method is constructed.
Its responsibilities are to:

1. Configure the TetradSearch object with the appropriate parameters and score.
2. Instantiate the correct Tetrad algorithm class.

The fully qualified class name can be identified by inspecting the Tetrad source
code at:

https://github.com/cmu-phil/tetrad

Be sure to browse the version corresponding to the installed Tetrad release. The
relevant Java classes are located under:

https://github.com/cmu-phil/tetrad/tree/development/tetrad-lib/src/main/java

```{r}
#| label: register algorithm
register_tetrad_algorithm(
  "my_boss_variant",
  function(
    self,
    num_starts = 1,
    use_bes = TRUE,
    use_data_order = TRUE,
    output_cpdag = TRUE
  ) {
    self$set_params(
      USE_BES = use_bes,
      NUM_STARTS = num_starts,
      USE_DATA_ORDER = use_data_order,
      OUTPUT_CPDAG = output_cpdag
    )

    self$alg <- rJava::.jnew(
      "edu/cmu/tetrad/algcomparison/algorithm/oracle/cpdag/Boss",
      self$score
    )
    self$alg$setKnowledge(self$knowledge)
  }
)
```

You can view all the custom registered Tetrad algorithms using
`list_registered_tetrad_algorithms()`, and reset it using
`reset_tetrad_alg_registry()`.

The structure of the method function for a Tetrad algorithm is then the exact
same as for bnlearn and pcalg, as seen below. BOSS is a **score-based**
algorithm, so it accepts a `score` argument. The algorithm returns a PDAG, so we
set the `graph_class` attribute to `"PDAG"`.

```{r}
#| label: my_boss_variant function
my_boss_variant <- function(
  engine = "tetrad",
  score,
  ...
) {
  engine <- match.arg(engine)

  make_method(
    method_name = "my_boss_variant",
    engine = engine,
    engine_fns = list(
      tetrad = function(...) {
        make_runner(engine = "tetrad", alg = "my_boss_variant", ...)
      }
    ),
    score = score,
    graph_class = "PDAG",
    ...
  )
}
```

We can now run `my_boss_variant()` like any other method in causalDisco.

```{r}
#| label: my_boss_variant example
#| eval: false
# Ensure Tetrad is installed and Java is working before running the algorithm
if (verify_tetrad()$installed && verify_tetrad()$java_ok) {
  my_boss_variant_tetrad <- my_boss_variant(
    engine = "tetrad",
    score = "sem_bic"
  )
  my_boss_variant_tetrad_result <- disco(tpc_example, my_boss_variant_tetrad)
  plot(my_boss_variant_tetrad_result)
}
```
```{r}
#| label: my_boss_variant_tetrad algorithm plot code
#| echo: false
#| eval: false
# Code to generate the plot image on pkgdown and CRAN.

# For pkgdown (comment out to avoid CRAN NOTE about unstated dependency used) use the function fig_settings
# ragg::agg_png(
#   filename = "custom-boss-variant-pkgdown.png",
#   width  = fig_settings$fig.width,
#   height = fig_settings$fig.height,
#   units  = "in",
#   res    = fig_settings$dpi * fig_settings$fig.retina
# )
my_boss_variant_tetrad <- my_boss_variant(engine = "tetrad", score = "sem_bic")
my_boss_variant_tetrad_result <- disco(tpc_example, my_boss_variant_tetrad)
plot(my_boss_variant_tetrad_result)
dev.off()

# For CRAN
# Hardcoded since knitr::opts_current$get() is different when running in console.
ragg::agg_png(
  filename = "custom-boss-variant-cran.png",
  width = 3,
  height = 3,
  units = "in",
  res = 92 * 1
)
my_boss_variant_tetrad <- my_boss_variant(engine = "tetrad", score = "sem_bic")
my_boss_variant_tetrad_result <- disco(tpc_example, my_boss_variant_tetrad)
plot(my_boss_variant_tetrad_result)
dev.off()
```
```{r}
#| label: ges-ebic-tetrad-include
#| echo: false
img_file <- if (identical(Sys.getenv("IN_PKGDOWN"), "true")) {
  "custom-boss-variant-pkgdown.png"
} else {
  "custom-boss-variant-cran.png"
}

knitr::include_graphics(img_file)
```

Once again, if using a hybrid or constraint-based algorithm rather than a
score-based one, the structure would remain the same but the method would accept
different arguments (e.g., `test` and `alpha` for a constraint-based algorithm).

Finally, we clean up the custom registered Tetrad algorithm

```{r}
#| label: cleanup
reset_tetrad_alg_registry()
```

# Conclusion

In this vignette, we have illustrated how to add new algorithms to causalDisco,
both from the R packages pcalg and bnlearn and from Tetrad.

If you have any feedback or suggestions for improvement on the API and
functionality for extending causalDisco, please let us know by opening an issue
on GitHub.