README

Did my cohort pick the correct number of patients? Am I calculating an intersection in the right way? Is that the expected value for treatment duration? It only takes one incorrect parameter to get incoherent results in a pharmacoepidemiological study, and testing calculations on huge, complex databases is very challenging.

That is why TestGenerator is useful: it lets you push a small sample of patients to unit test a study on the OMOP CDM. It includes tools to create a blank CDM with a complete vocabulary and check whether the code is doing what we expect in very specific cases.

Installation

install.packages("TestGenerator")

Basic workflow

TestGenerator starts from a small patient dataset. The data can be stored in an Excel workbook, with one sheet per OMOP CDM table, or in a folder of CSV files, with one file per table.

The package then converts those files into a Unit Test Definition JSON file. This JSON file is the object you keep in your package tests.

TestGenerator::readPatients(
  filePath = "inst/extdata/icu_sample_population.xlsx",
  testName = "icu_sample",
  outputPath = "tests/testthat/testCases",
  cdmVersion = "5.4"
)

If outputPath = NULL, the JSON file is written to tests/testthat/testCases, which is the usual location for package tests.

TestGenerator::readPatients.xl(
  filePath = "inst/extdata/icu_sample_population.xlsx",
  testName = "icu_sample",
  outputPath = "tests/testthat/testCases",
  cdmVersion = "5.4"
)

TestGenerator::readPatients.csv(
  filePath = "inst/extdata/icu_sample_population_csv",
  testName = "icu_sample",
  outputPath = "tests/testthat/testCases",
  cdmVersion = "5.4",
  reduceLargeIds = FALSE
)

Create a test CDM

Use patientsCDM() to load one Unit Test Definition into a blank OMOP CDM. By default, this creates a local DuckDB CDM with the small patient population and the vocabulary needed for testing.

cdm <- TestGenerator::patientsCDM(
  pathJson = "tests/testthat/testCases",
  testName = "icu_sample",
  cdmVersion = "5.4"
)

If pathJson = NULL, TestGenerator looks for the JSON file in tests/testthat/testCases.

file_path <- system.file(
  "extdata",
  "icu_sample_population.xlsx",
  package = "TestGenerator"
)

output_path <- file.path(tempdir(), "testgenerator-example")
dir.create(output_path, recursive = TRUE, showWarnings = FALSE)

TestGenerator::readPatients(
  filePath = file_path,
  testName = "icu_sample",
  outputPath = output_path,
  cdmVersion = "5.4"
)

cdm <- TestGenerator::patientsCDM(
  pathJson = output_path,
  testName = "icu_sample",
  cdmVersion = "5.4"
)

DBI::dbDisconnect(CDMConnector::cdmCon(cdm), shutdown = TRUE)
unlink(output_path, recursive = TRUE)

Use it in a package test

The most useful pattern is to keep a small JSON test case in tests/testthat/testCases, build a CDM inside a testthat test, run your study code, and assert the expected result.

testthat::test_that("cohort construction returns the expected patients", {
  cdm <- TestGenerator::patientsCDM(
    pathJson = "tests/testthat/testCases",
    testName = "icu_sample",
    cdmVersion = "5.4"
  )
  withr::defer(
    DBI::dbDisconnect(CDMConnector::cdmCon(cdm), shutdown = TRUE)
  )

  cohort_set <- CDMConnector::readCohortSet(
    system.file("extdata", "test_cohorts", package = "TestGenerator")
  )

  cdm <- CDMConnector::generateCohortSet(
    cdm = cdm,
    cohortSet = cohort_set,
    name = "test_cohorts"
  )

  result <- cdm[["test_cohorts"]] |>
    dplyr::collect()

  testthat::expect_equal(
    sort(unique(result$subject_id)),
    c(1, 2, 4, 5, 6, 7)
  )
})

The exact expectation should come from the micro population you designed. Good tests usually check subject counts, inclusion or exclusion rules, cohort dates, or treatment durations that are easy to verify by hand.