blocking: Various Blocking Methods for Entity Resolution

The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.

Version: 1.0.0
Depends: R (≥ 4.1.0)
Imports: text2vec, tokenizers, RcppHNSW, RcppAnnoy, mlpack, rnndescent, igraph, data.table, RcppAlgos, methods, readr, utils, Matrix
Suggests: tinytest, knitr, rmarkdown, reclin2
Published: 2025-06-13
DOI: 10.32614/CRAN.package.blocking
Author: Maciej Beręsewicz ORCID iD [aut, cre], Adam Struzik [aut, ctr]
Maintainer: Maciej Beręsewicz <maciej.beresewicz at ue.poznan.pl>
BugReports: https://github.com/ncn-foreigners/blocking/issues
License: GPL-3
URL: https://github.com/ncn-foreigners/blocking, https://ncn-foreigners.ue.poznan.pl/blocking/
NeedsCompilation: no
Materials: README NEWS
CRAN checks: blocking results

Documentation:

Reference manual: blocking.pdf
Vignettes: Blocking records for deduplication (source, R code)
Blocking records for record linkage (source, R code)
Integration with existing packages (source, R code)

Downloads:

Package source: blocking_1.0.0.tar.gz
Windows binaries: r-devel: not available, r-release: blocking_1.0.0.zip, r-oldrel: blocking_1.0.0.zip
macOS binaries: r-release (arm64): blocking_1.0.0.tgz, r-oldrel (arm64): blocking_1.0.0.tgz, r-release (x86_64): not available, r-oldrel (x86_64): not available

Linking:

Please use the canonical form https://CRAN.R-project.org/package=blocking to link to this page.