punycoder

R build status CRAN status

High-performance Unicode and Punycode encoding/decoding for internationalized domain names (IDNs) in R.

Overview

The punycoder package addresses critical gaps in R’s URL processing capabilities by providing reliable, fast conversion between Unicode and ASCII representations of domain names. It follows RFC 3492 standards and is designed for robust handling of internationalized domain names in web scraping, data analysis, and URL processing workflows.

Dependencies

punycoder has a small dependency footprint:

Installation

You can install the development version of punycoder from GitHub with:

# install.packages("remotes")
remotes::install_github("bart-turczynski/punycoder")

Optional native backend (libidn2)

punycoder works without extra system libraries. If libidn2 is available at build time, the package enables a native backend automatically; otherwise it uses the built-in C++ fallback backend.

To install the recommended optional dependency:

Verify the library is visible before installing punycoder from source:

system("pkg-config --modversion libidn2")

Then install/reinstall punycoder:

remotes::install_github("bart-turczynski/punycoder")

Example

library(punycoder)

# Basic encoding
puny_encode("café.com")
#> [1] "xn--caf-dma.com"

# Check if domain is punycode
is_punycode("xn--example")
#> [1] TRUE

# Validate domains
validate_domain("test.com")
#> Punycoder Domain Validation Results
#> ==================================
#> 
#> Domain: test.com 
#> Valid:  TRUE

Key Features

Use Cases

Web Scraping

Process international websites with Unicode domain names:

international_urls <- c(
  "https://café.paris.fr/menu",
  "https://москва.рф/news",
  "https://北京.中国/info"
)

# Convert for HTTP requests
ascii_urls <- url_encode(international_urls)

Data Analysis

Clean and standardize URL datasets:

# Identify international domains
is_idn(c("café.com", "example.com", "москва.рф"))

# Validate domain names
validate_domain(c("valid.com", "invalid..domain"))

Current State

punycoder currently provides:

Acknowledgments

Contributing

We welcome contributions. See CONTRIBUTING.md for the current development workflow.

License

MIT