fuzzystring: Fast Fuzzy String Joins for Data Frames

Perform fuzzy joins on data frames using approximate string matching. Implements all standard join types (inner, left, right, full, semi, anti) with support for multiple string distance metrics from the 'stringdist' package including Levenshtein, Damerau-Levenshtein, Jaro-Winkler, and Soundex. Features a high-performance 'data.table' backend with 'C++' row binding for efficient processing of large datasets. Ideal for matching misspellings, inconsistent labels, messy user input, or reconciling datasets with slight variations in identifiers. Optionally returns distance metrics alongside matched records.

Version: 0.0.1
Depends: R (≥ 4.1)
Imports: data.table, Rcpp, stringdist
LinkingTo: Rcpp
Suggests: dplyr, ggplot2, knitr, qdapDictionaries, readr, rmarkdown, rvest, stringr, testthat (≥ 3.0.0), tidyr
Published: 2026-02-08
DOI: 10.32614/CRAN.package.fuzzystring (may not be active yet)
Author: Paul E. Santos Andrade ORCID iD [aut, cre], David Robinson [ctb] (aut of fuzzyjoin)
Maintainer: Paul E. Santos Andrade <paulefrens at gmail.com>
BugReports: https://github.com/PaulESantos/fuzzystring/issues
License: MIT + file LICENSE
URL: https://github.com/PaulESantos/fuzzystring, https://paulesantos.github.io/fuzzystring/
NeedsCompilation: yes
Materials: README
CRAN checks: fuzzystring results

Documentation:

Reference manual: fuzzystring.html , fuzzystring.pdf
Vignettes: Getting Started with fuzzystring (source, R code)

Downloads:

Package source: fuzzystring_0.0.1.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): fuzzystring_0.0.1.tgz, r-oldrel (arm64): fuzzystring_0.0.1.tgz, r-release (x86_64): fuzzystring_0.0.1.tgz, r-oldrel (x86_64): fuzzystring_0.0.1.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=fuzzystring to link to this page.