textpress: A Lightweight and Versatile NLP Toolkit
A lightweight toolkit for text retrieval and NLP with a consistent and
predictable API organized around four actions: fetching, reading,
processing, and searching. Functions cover the full pipeline from web
data acquisition to text processing and indexing. Multiple search
strategies are supported including regex, BM25 keyword ranking, cosine
similarity, and dictionary matching. Pipe-friendly with no heavy
dependencies and all outputs are plain data frames. Also useful as a
building block for retrieval-augmented generation pipelines and
autonomous agent workflows.
| Version: |
1.1.0 |
| Depends: |
R (≥ 3.5) |
| Imports: |
data.table, httr, Matrix, rvest, stringi, stringr, xml2, pbapply, jsonlite, lubridate |
| Suggests: |
SnowballC (≥ 0.7.0) |
| Published: |
2026-02-23 |
| DOI: |
10.32614/CRAN.package.textpress |
| Author: |
Jason Timm [aut, cre] (year: 2026) |
| Maintainer: |
Jason Timm <JaTimm at salud.unm.edu> |
| BugReports: |
https://github.com/jaytimm/textpress/issues |
| License: |
MIT + file LICENSE |
| URL: |
https://github.com/jaytimm/textpress,
https://jaytimm.github.io/textpress/ |
| NeedsCompilation: |
no |
| Materials: |
README, NEWS |
| CRAN checks: |
textpress results |
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=textpress
to link to this page.