RcppTskit: Working with tree sequences in R

Gregor Gorjanc

2026-02-22

Introduction

This vignette introduces working with tree sequences in R using the RcppTskit package. RcppTskit provides R access to the tskit C application programming interface (API) (Jeffery et al. 2026) (https://tskit.dev/tskit/docs/stable/c-api.html). If you are new to tree sequences and the broader concept of ancestral recombination graphs (ARGs), see Brandt et al. (2024), Lewanski, Grundler, and Bradburd (2024), Nielsen, Vaughn, and Deng (2024), and Wong et al. (2024). Before showing how to use RcppTskit, we summarise the now extensive tree sequence ecosystem, because this has shaped the aim and design of RcppTskit. We then highlight the aims of RcppTskit, describe the implemented data and class model, and show four typical use cases.

As summarised below, Python is the most widely used environment for working with tree sequences. Using the R package reticulate (Ushey, Allaire, and Tang 2025) (https://rstudio.github.io/reticulate/), most R users can and should leverage the large ecosystem of Python packages, in particular the popular tskit Python API (Jeffery et al. 2026) (https://tskit.dev/tskit/docs/stable/python-api.html). With this in mind, RcppTskit is primarily geared towards providing R access to the tskit C API (Jeffery et al. 2026), for cases where the reticulate option is not optimal; for example, high-performance or low-level work with tree sequences. As a result, RcppTskit currently provides a limited set of R functions because the Python API (and reticulate) already covers most needs. As the name suggests, RcppTskit leverages the R package Rcpp (Eddelbuettel et al. 2026) (https://www.rcpp.org), which significantly lowers the barrier to using C++ in R. However, we still need to write C++ wrappers and expose them to R, so we recommend using reticulate first. The implemented R functions in RcppTskit closely mimic tskit Python functions to streamline the use of both the R and Python APIs.

State of the tree sequence ecosystem

The tree sequence ecosystem is rapidly evolving. The website https://tskit.dev/software/ lists tools that closely interoperate with tskit, while Jeffery et al. (2026) lists additional tools that depend on tskit functionality. Consequently, there are now many tools for the generation and analysis of tree sequences. Below is a quick summary of some of the tools relevant to RcppTskit as of January 2026.

The above tools enable work with tree sequences and/or generate them via simulation. There is a growing list of tools that estimate ARGs from observed genomic data and can export them in the tree sequence file format. Notable examples include: tsinfer (https://tskit.dev/tsinfer/docs/, https://github.com/tskit-dev/tsinfer), Relate (https://myersgroup.github.io/relate/, https://github.com/MyersGroup/relate), SINGER (https://github.com/popgenmethods/SINGER), ARGNeedle (https://palamaralab.github.io/software/argneedle/, https://github.com/PalamaraLab/arg-needle-lib), and Threads (https://palamaralab.github.io/software/threads/, https://github.com/palamaraLab/threads).

As described above, the tree sequence ecosystem is extensive. Python is the most widely used platform to interact with tree sequences, with comprehensive packages for simulation and analysis.

There is interest in working with tree sequences in R. Because we can call Python from within R using the reticulate R package, there is no pressing need for a dedicated R API for work with tree sequences. See https://tskit.dev/tutorials/tskitr.html for an example of this approach. This keeps the community focused on the Python collection of packages. While there are differences between Python and R, many R users should be able to follow the extensive Python API documentation, examples, and tutorials listed above, especially those at https://tskit.dev/tutorials/.

To provide an idiomatic R interface to some population genetic simulation steps and operations with tree sequences, slendr implements bespoke functions and wrappers to interact with msprime, SLiM, and tskit. It uses reticulate to interact with the Python APIs of these packages, which further lowers barriers for R users to work with tree sequences.

One downside of using reticulate is the overhead of calling Python functions. This overhead is minimal for most analyses because a user calls a few Python functions, which do all the work (including loops) on the Python side, which often call the tskit C API. However, the overhead can be limiting for repeated calls between R and Python, such as calling Python functions from within an R loop, say to record a tree sequence in a multi-generation simulation with many individuals.

Aims for RcppTskit

Given the current tree sequence ecosystem, the aims of the RcppTskit package are to provide an easy-to-install R package that supports users in four typical cases. The authors are open to expanding this scope of RcppTskit depending on user demand and engagement. The four typical cases are:

  1. Load a tree sequence1 into R and summarise it,

  2. Pass a tree sequence2 between R and reticulate or standard Python,

  3. Call the tskit C API from C++ in an R session or script, and

  4. Call the tskit C API from C++ in another R package.

Examples for all of these cases are provided below after we describe the implemented data and class model.

Data and class model

RcppTskit represents a tree sequence as a lightweight R6 object of class TreeSequence. The R6 class was chosen in part so that TreeSequence method calls in R resemble the tskit Python API, particularly when compared to reticulate Python. TreeSequence wraps an external pointer (externalptr) to the tskit C object structure tsk_treeseq_t. Most methods (for example, ts$num_individuals() and ts$dump()) call the tskit C API via Rcpp, so the calls are fast. The underlying pointer is exposed as TreeSequence$pointer for developers and advanced users who work with C++. In C++, the pointer has type RcppTskit_treeseq_xptr, and the tree sequence memory is released by the Rcpp::XPtr finaliser when the pointer is garbage-collected in R.

RcppTskit also provides a lightweight TableCollection R6 class, which wraps an an external pointer to the tskit C object structure tsk_table_collection_t. In C++, the pointer has type RcppTskit_table_collection_xptr with the same memory management as RcppTskit_treeseq_xptr. While tsk_treeseq_t is an immutable object, tsk_table_collection_t is a mutable object, which can be edited. No R functions for editing are implemented to date, so all editing should happen in C++ or Python.

For typical use cases

First install RcppTskit from CRAN and load it.

# install.packages("RcppTskit")

test <- require(RcppTskit)
#> Loading required package: RcppTskit
if (!test) {
  message("RcppTskit not available; skipping vignette execution.")
  knitr::opts_chunk$set(eval = FALSE)
}

1) Load a tree sequence3 into R and summarise it

# Load a tree sequence
ts_file <- system.file("examples/test.trees", package = "RcppTskit")
ts <- ts_load(ts_file)
methods::is(ts)
#> [1] "TreeSequence"

# Print the summary of the tree sequence
ts$print()
# ts # the same as above

ts$num_individuals()
#> [1] 8

# Access the table collection
tc <- ts$dump_tables()
tc$print()

# Convert the table collection to tree sequence
ts2 <- tc$tree_sequence()

# Explore the help pages
help(package = "RcppTskit")

2) Pass a tree sequence4 between R and reticulate or standard Python

# Tree sequence in R
ts_file <- system.file("examples/test.trees", package = "RcppTskit")
ts <- ts_load(ts_file)

# If you now want to use the tskit Python API via reticulate, use
tskit <- get_tskit_py()
#> Python module tskit is not available. Attempting to install it ...
if (check_tskit_py(tskit)) {
  ts_py <- ts$r_to_py()
  # ... continue in reticulate Python ...
  ts_py$num_individuals # 8
  ts2_py = ts_py$simplify(samples = c(0L, 1L, 2L, 3L))
  ts2_py$num_individuals # 2
  ts2_py$num_nodes # 8
  ts2_py$tables$nodes$time # 0.0 ... 5.0093910
  # ... and to bring it back to R, use ...
  ts2 <- ts_py_to_r(ts2_py)
  ts2$num_individuals() # 2
}
#> [1] 2

# If you prefer standard (non-reticulate) Python, use
ts_file <- tempfile()
print(ts_file)
#> [1] "/var/folders/f_/58r_f9r97t3gmvlmt3_l834m0000gq/T//RtmpJazHRM/file11cbc17a99ebd"
ts$dump(file = ts_file)
# ... continue in standard Python ...
# import tskit
# ts = tskit.load("insert_ts_file_path_here")
# ts.num_individuals # 8
# ts2 = ts.simplify(samples = [0, 1, 2, 3])
# ts2.num_individuals # 2
# ts2.dump("insert_ts_file_path_here")
# ... and to bring it back to R, use ...
ts2 <- ts_load(ts_file)
ts$num_individuals() # 2 (if you have run the above Python code)
#> [1] 8
file.remove(ts_file)
#> [1] TRUE

# You can work similarly with table collection between R & Python
tc <- ts$dump_tables()
if (check_tskit_py(tskit)) {
  tc_py <- tc$r_to_py()
  # ... continue in reticulate Python ...
  tmp <- tc_py$simplify(samples = c(0L, 1L, 2L, 3L))
  tmp
  tc_py$individuals$num_rows # 2
  tc_py$nodes$num_rows # 8
  tc_py$nodes$time # 0.0 ... 5.0093910
  # ... and to bring it back to R, use ...
  tc2 <- tc_py_to_r(tc_py)
  tc2$print()
}

3) Call the tskit C API from C++ in an R session or script

# Write a C++ function as a multi-line character string
codeString <- '
  #include <RcppTskit.hpp>
  int ts_num_individuals(SEXP ts) {
      RcppTskit_treeseq_xptr ts_xptr(ts);
    return (int) tsk_treeseq_get_num_individuals(ts_xptr);
  }'

# Compile the C++ function
ts_num_individuals2 <- Rcpp::cppFunction(
  code = codeString,
  depends = "RcppTskit",
  plugins = "RcppTskit"
)
# We must specify both the `depends` and `plugins` arguments!

# Load a tree sequence
ts_file <- system.file("examples/test.trees", package = "RcppTskit")
ts <- ts_load(ts_file)

# Apply the compiled function
# (on the pointer)
ts_num_individuals2(ts$pointer)
#> [1] 8

# An identical RcppTskit implementation
# (available as the method of the TreeSequence class)
ts$num_individuals()
#> [1] 8

4) Call the tskit C API from C++ in another R package

To call the tskit C API in your own R package via Rcpp you can leverage RcppTskit, which simplifies installation and provides the linking flags you need. To do this, follow the steps below and check how these are implemented in the demo R package RcppTskitTestLinkingTo at https://github.com/HighlanderLab/RcppTskitTestLinking.

  1. Open the DESCRIPTION file and add RcppTskit to the Imports: and LinkingTo: fields, and add Rcpp to the LinkingTo: field.

  2. Create R/YourPackage-package.R and add at minimum: #' @import RcppTskit in one line and "_PACKAGE" in another line, assuming you use devtools to manage your package NAMESPACE imports.

  3. Add #include <RcppTskit.hpp> as needed to your C++ header files in src directory.

  4. Add // [[Rcpp::depends(RcppTskit)]] to your C++ files in src directory.

  5. Add // [[Rcpp::plugins(RcppTskit)]] to your C++ files in src directory.

  6. Call the RcppTskit C++ API and the tskit C API as needed in src directory.

  7. Configure your package build to link against the RcppTskit library with the following steps:

  1. You should now be ready to build, check, and install your package using devtools::build(), devtools::check(), and devtools::install() or their R CMD equivalents.

Here is code you can run to check out RcppTskitTestLinkingTo:

# Install and load the demo package
remotes::install_github("HighlanderLab/RcppTskitTestLinking")
library(RcppTskitTestLinking)

# Check the demo function
print(ts_num_individuals_ptr2)

# Example tree sequence
ts_file <- system.file("examples", "test.trees", package = "RcppTskit")
ts <- ts_load(ts_file)

# Function from RcppTskit (working with TreeSequence R6 class)
ts$num_individuals()

# Function from RcppTskitTestLinking (working with externalptr)
ts_num_individuals_ptr2(ts$pointer)

Conclusion

RcppTskit provides R access to the tskit C API with a simple installation and a lightweight interface. It provides a limited number of R functions because most users can and should use reticulate to call the tskit Python API from R. The implemented R functions closely mimic tskit Python functions to streamline the use of both the R and Python APIs. When this option is not optimal, developers and advanced users can call the tskit C API via Rcpp.

Session information

sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sequoia 15.7.4
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Oslo
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] RcppTskit_0.2.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.53        
#>  [5] Matrix_1.7-3      lattice_0.22-6    reticulate_1.44.1 glue_1.8.0       
#>  [9] knitr_1.50        htmltools_0.5.9   png_0.1-8         rmarkdown_2.30   
#> [13] lifecycle_1.0.4   cli_3.6.5         vctrs_0.6.5       grid_4.5.0       
#> [17] withr_3.0.2       compiler_4.5.0    rprojroot_2.1.1   here_1.0.2       
#> [21] tools_4.5.0       pillar_1.11.1     evaluate_1.0.5    Rcpp_1.1.1       
#> [25] yaml_2.3.10       rlang_1.1.7       jsonlite_2.0.0
Brandt, Débora Y C, Christian D Huber, Charleston W K Chiang, and Diego Ortega-Del Vecchyo. 2024. “The Promise of Inferring the Past Using the Ancestral Recombination Graph (ARG).” Genome Biology and Evolution, evae005. https://doi.org/10.1093/gbe/evae005.
Eddelbuettel, Dirk, Romain Francois, JJ Allaire, Kevin Ushey, Qiang Kou, Nathan Russell, Iñaki Ucar, Doug Bates, and John Chambers. 2026. Rcpp: Seamless r and c++ Integration. https://doi.org/10.32614/CRAN.package.Rcpp.
Jeffery, Ben, Yan Wong, Kevin Thornton, Georgia Tsambos, Gertjan Bisschop, Yun Deng, E Castedo Ellerman, et al. 2026. “Population-Scale Ancestral Recombination Graphs with Tskit 1.0.” arXiv. https://doi.org/10.48550/arXiv.2602.09649.
Lewanski, Alexander L., Michael C. Grundler, and Gideon S. Bradburd. 2024. “The Era of the ARG: An Introduction to Ancestral Recombination Graphs and Their Significance in Empirical Evolutionary Genomics.” PLOS Genetics 20 (1): 1–24. https://doi.org/10.1371/journal.pgen.1011110.
Nielsen, Rasmus, Andrew H. Vaughn, and Yun Deng. 2024. “Inference and Applications of Ancestral Recombination Graphs.” Nature Reviews Genetics 26: 47–58. https://doi.org/10.1038/s41576-024-00772-4.
Ushey, Kevin, JJ Allaire, and Yuan Tang. 2025. Reticulate: Interface to Python. https://doi.org/10.32614/CRAN.package.reticulate.
Wong, Yan, Anastasia Ignatieva, Jere Koskela, Gregor Gorjanc, Anthony W Wohns, and Jerome Kelleher. 2024. “A General and Efficient Representation of Ancestral Recombination Graphs.” Genetics 228 (1): iyae100. https://doi.org/10.1093/genetics/iyae100.

  1. Both tree sequence and table collection types are supported.↩︎

  2. Both tree sequence and table collection types are supported.↩︎

  3. Both tree sequence and table collection types are supported.↩︎

  4. Both tree sequence and table collection types are supported.↩︎