poLCAParallel

Polytomous Variable Latent Class Analysis

With Bootstrap Likelihood Ratio Test

Sherman E. Lo, Queen Mary, University of London

A reimplementation of poLCA [CRAN, GitHub] in C++. It attempts to reproduce results and be as similar as possible to the original code, while running faster, especially with multiple repetitions, by utilising multiple threads.

About poLCAParallel

The package poLCAParallel reimplements the poLCA fitting, standard error calculations, goodness of fit tests and the bootstrap log-likelihood ratio test in C++. This was done using Rcpp and RcppArmadillo which allows R to run fast C++ code. Additional notes include:

Further reading is available on the QMUL ITS Research Blog.

About poLCA

poLCA is a software package for the estimation of latent class models and latent class regression models for polytomous outcome variables, implemented in the R statistical computing environment.

Latent class analysis (also known as latent structure analysis) can be used to identify clusters of similar “types” of individuals or observations from multivariate categorical data, estimating the characteristics of these latent groups, and returning the probability that each observation belongs to each group. These models are also helpful in investigating sources of confounding and nonindependence among a set of categorical variables, as well as for density estimation in cross-classification tables. Typical applications include the analysis of opinion surveys; rater agreement; lifestyle and consumer choice; and other social and behavioral phenomena.

The basic latent class model is a finite mixture model in which the component distributions are assumed to be multi-way cross-classification tables with all variables mutually independent. The model stratifies the observed data by a theoretical latent categorical variable, attempting to eliminate any spurious relationships between the observed variables. The latent class regression model makes it possible for the researcher to further estimate the effects of covariates (or “concomitant” variables) on predicting latent class membership.

poLCA uses expectation-maximization and Newton-Raphson algorithms to find maximum likelihood estimates of the parameters of the latent class and latent class regression models.

The easiest way to install poLCAParallel is to use R with remotes.

Install From GitHub

Run the following in R to install the latest version

remotes::install_github("QMUL/poLCAParallel@package")

or for a previous version, for example,

remotes::install_github("QMUL/poLCAParallel@v1.2.4")

Install From Releases

Download the .zip or .tar.gz file from the releases. Install it in R using

remotes::install_local(<PATH TO .zip OR .tar.gz FILE>)

User’s Notes

Citation

Please consider citing the corresponding QMUL ITS Research Blog

and the publication below which this software was originally created for

Tips

Example Code

R scripts which compare poLCAParallel with poLCA are provided in exec/. An example use of a bootstrap likelihood ratio test is shown in exec/3_blrt.R.

Changes from the Original Code

Developer’s and Maintainer’s Notes

Installing as a Developer

The following installation instructions are useful if you wish to develop the code and install a locally modified version of the package.

Requires the R packages for compiling and testing:

Requires the dependent R packages:

Git clone this repository

git clone https://github.com/QMUL/poLCAParallel.git

and change directory into it

cd poLCAParallel

From there, in the repository root, run the following to generate additional code and documentation so that the package can be compiled correctly

R -e "usethis::use_namespace()"
R -e "Rcpp::compileAttributes()"
R -e "roxygen2::roxygenize()"

Install the package using

R CMD INSTALL --preclean --no-multiarch .

Testing

The testing of the C++ code is done using Catch2 and the R code using testthat. All test codes are in tests/.

C++ with Catch2

The tests for the C++ code are done by compiling the test code, isolated from any R ecosystem, and running a compiled executable. It requires cmake, Catch2 and armadillo. To compile the code, from the repository root, make a new directory and use cmake inside it

mkdir build
cd build
cmake ..
cmake --build .

This will compile an executable called test_polca_parallel. Execute it to run the tests. Pass names or tags to run specific tests, see tests/*.cc.

R with testthat

To test the R code, run the following at the repository root

R -e "testthat::test_local()"

R Dependency Management

The package renv is used to record and manage R dependencies, with versions pinned, for use during development, maintenance and testing. The file renv.lock contains these dependencies. It shall be regularly updated during maintenance. The lock file is also used in the Apptainer definition file poLCAParallel-dev.def below to further reproduce the environment in a container.

Restoring the R Environment

From the repository root, run the following commands to set up an R environment and install the dependencies, with the specified versions, used for development and testing

R -e "renv::init(bare=TRUE)"
R -e "renv::restore()"

Run R commands from the repository root to use these dependencies.

Taking a Snapshot of the Environment with the Latest Versions

The lock file may need to be updated during maintenance. This can be done by starting a fresh R environment, after ensuring the renv artifacts are deleted:

Then take a snapshot of the latest dependencies

R -e "renv::init()"
R -e "renv::snapshot(dev=TRUE)"

This will overwrite the file renv.lock specifying dependencies with the latest versions.

Apptainer

Apptainer definition files are provided, which can be used to install the package inside a container. These may be useful for further troubleshooting or development.

To build the container, use the command (or similar)

apptainer build poLCAParallel-dev.sif poLCAParallel-dev.def

Within the container, the package is located in /usr/src/poLCAParallel. When using the definition file poLCAParallel-dev.def, the C++ doxygen documentation is located in /usr/src/poLCAParallel/html.

Git/GitHub Workflow Guide

All generated documents and codes, eg from

R -e "Rcpp::compileAttributes()"

and

R -e "roxygen2::roxygenize()"

shall not be included in the master branch. Instead, they shall be in the package branch so that this package can be installed using remotes::install_github("QMUL/poLCAParallel@package"). This is to avoid having duplicate documentation and generated code on the master branch. The exception to this rule is renv.lock which is produced by renv::snapshot(dev=TRUE).

Semantic versioning is used and tagged. Tags on the master branch shall have v prepended and -master appended, eg. v1.1.0-master. The corresponding tag on the package branch shall only have v prepended, eg. v1.1.0.

Development Notes

Actions for the Next Minor Version(s)

Actions for the Next Major Version

The R code should follow the Tidyverse style guide. In particular, variables, functions and parameters should be in snake case. This will result in

The following R functions, many of which are internal, are marked as deprecated and should be deleted

All C code in poLCA.C is deprecated because they are reimplemented in C++.

The parameters:

should be renamed to lc to be consistent with other functions with a parameter also named lc.

Similarly, the parameters model_null and model_alt in blrt() should be renamed to lc_null and lc_alt respectively.

C++ Style Guide

There was an attempt to use the Google C++ style guide.

C++ Source Code Documentation

The C++ code documentation can be created with Doxygen by running

doxygen

and viewed at html/index.html.

References

License

The software is under the GNU GPL 2.0 license, as with the original poLCA code, stated in their documentation.