crystract provides a suite of functions to parse
Crystallographic Information Files (.cif), extracting
essential data such as chemical formulas, unit cell parameters, atomic
coordinates, and symmetry operations. It also includes tools to
calculate interatomic distances, identify bonded pairs using various
algorithms (Minimum Distance, Brunner’s, Hoppe’s, Voronoi, CrystalNN),
determine nearest neighbor counts, and calculate bond angles. All data
is extracted into nested data.tables, which can then be
exported as an R Data Structure (RDS) or folders of .csv files. The
package is designed to facilitate the preparation of crystallographic
data for further analysis, including machine learning applications in
materials science.
Note on Repository Structure
The
crystractpackage is located within thepackages/crystract/subdirectory of thePrabhuLab/ml-crystalsGitHub repository. You must use thesubdirargument during installation, as shown below.
data.table for fast and robust extraction of metadata, unit
cell parameters, atomic coordinates, and symmetry operations.minimum_distance (default), brunner,
econ (Hoppe’s), voronoi, and
crystal_nn methods.analyze_cif_files() function is designed to process
hundreds of files in a single run, and results can be easily exported to
a structured directory of CSV files with
export_analysis_to_csv().The following diagram illustrates the primary data pipeline in
crystract, from raw CIF input to final CSV export.

To assist researchers in configuring crystract for their
specific datasets, we provide the following decision trees for selecting
atomic radii and choosing the most appropriate bonding algorithm.
When invoking
analyze_cif_files(..., bonding_algorithms = c(...)), we
recommend choosing your target algorithm based on the chemical makeup of
your structure and the electronegativity differences (\(\Delta EN\)) of the expected bonds.

When utilizing functions like filter_ghost_distances()
or algorithms that rely on distance cutoffs, crystract
employs an internal logic to select the most appropriate atomic radius.
You can also override this by injecting your own custom radii dictionary
via set_radii_data().

The analyze_single_cif() (and its batch counterpart
analyze_cif_files()) provides a complete, one-step
workflow. Here we run it on an example crystal structure included inside
the package itself, demonstrating the exact data outputs you can
expect.
library(crystract)
library(data.table)
# 1. Load the built-in demo CIF file (Strontium Silicide)
cif_path <- system.file("extdata", "1590946.cif", package = "crystract")
# 2. Analyze the file
# This single function handles parsing, supercell expansion, geometric calculations,
# bonding detection, and error propagation.
analysis_results <- analyze_single_cif(
cif_path,
bonding_algorithms = c("minimum_distance", "crystal_nn")
)The returned object is a single row data.table
containing both high-level metadata and list-columns storing the
detailed extracted measurements.
| database_code | chemical_formula | space_group_name |
|---|---|---|
| depnum_ccdc_archive CCDC 1590946 | Si1 Sr2 | P n m a |
High-Level Crystal Information
The parameters defining the size and shape of the unit cell are securely parsed along with their experimental uncertainties (if available in the CIF).
| _cell_length_a | _cell_length_b | _cell_length_c | _cell_angle_alpha | _cell_angle_beta | _cell_angle_gamma |
|---|---|---|---|---|---|
| 8.11 | 5.15 | 9.54 | 90 | 90 | 90 |
Extracted Unit Cell Parameters (Å and Degrees)
crystract identifies bonded pairs using your chosen
algorithms. Below is the output from the Minimum
Distance algorithm. Notice the rigorous propagation of
experimental error (DistanceError).
| Atom1 | Atom2 | Distance | DistanceError | Weight |
|---|---|---|---|---|
| Si1 | Sr1_1_0_0_0 | 3.163544 | 0 | 1.0000000 |
| Si1 | Sr1_2_0_0_0 | 3.245310 | 0 | 0.9748050 |
| Si1 | Sr1_4_0_-1_-1 | 3.184477 | 0 | 0.9934267 |
| Si1 | Sr1_4_0_0_-1 | 3.184477 | 0 | 0.9934267 |
| Si1 | Sr2_1_0_0_-1 | 3.261366 | 0 | 0.9700058 |
| Si1 | Sr2_3_-1_-1_0 | 3.465249 | 0 | 0.9129342 |
Predicted Bonded Pairs (Minimum Distance Method)
Using the metric tensor, all connected triplets are evaluated to calculate the exact internal bond angles across the repeating periodic boundaries.
| CentralAtom | Neighbor1 | Neighbor2 | Angle | AngleError |
|---|---|---|---|---|
| Si1 | Sr1_1_0_0_0 | Sr1_2_0_0_0 | 109.37260 | 0 |
| Si1 | Sr1_1_0_0_0 | Sr1_4_0_-1_-1 | 125.55190 | 0 |
| Si1 | Sr1_1_0_0_0 | Sr1_4_0_0_-1 | 125.55190 | 0 |
| Si1 | Sr1_1_0_0_0 | Sr2_1_0_0_-1 | 129.28796 | 0 |
| Si1 | Sr1_1_0_0_0 | Sr2_3_-1_-1_0 | 69.08689 | 0 |
| Si1 | Sr1_1_0_0_0 | Sr2_3_-1_0_0 | 69.08689 | 0 |
Calculated Interatomic Angles
Here is a comprehensive overview of the columns generated in the Master Analysis Object and its nested tables.
| Column Name | Data Type | Description |
|---|---|---|
file_name |
Character | The name of the processed CIF file. |
database_code |
Character | The unique identifier from the source database. |
chemical_formula |
Character | The chemical sum formula extracted from the CIF. |
structure_type |
Character | The name of the structure type. |
space_group_name |
Character | Hermann-Mauguin space group symbol. |
space_group_number |
Character | International Tables space group number. |
unit_cell_metrics |
List (DT) | Nested table containing lattice parameters. |
atomic_coordinates |
List (DT) | Nested table of primary asymmetric atoms. |
symmetry_operations |
List (DT) | Nested table of symmetry operators. |
transformed_coords |
List (DT) | Nested table of the full unit cell atoms. |
expanded_coords |
List (DT) | Nested table of the supercell (3x3x3) atoms. |
distances |
List (DT) | Nested table of all calculated interatomic distances. |
bonded_pairs_* |
List (DT) | Nested table of bonds detected via
requested methods (e.g. _minimum_distance). |
neighbor_counts_* |
List (DT) | Nested table of coordination numbers for requested methods. |
bond_angles_* |
List (DT) | Nested table of calculated bond angles for requested methods. |
| Column Name | Data Type | Description |
|---|---|---|
Label |
Character | Unique atom label (e.g., “Fe1”). |
WyckoffSymbol |
Character | The Wyckoff letter (e.g., “c”). |
WyckoffMultiplicity |
Numeric | The site multiplicity (e.g., 4). |
Occupancy |
Numeric | Site occupancy factor (0.0 to 1.0). |
x_a, y_b,
z_c |
Numeric | Fractional coordinates along axis \(a, b, c\). |
*_error |
Numeric | Standard uncertainties for coordinates. |
| Column Name | Data Type | Description |
|---|---|---|
Atom1 |
Character | Label of the central atom (from the asymmetric unit). |
Atom2 |
Character | Label of the neighbor atom (from the expanded supercell). |
Distance |
Numeric | Calculated Euclidean distance in Angstroms (Å). |
DistanceError |
Numeric | Propagated standard uncertainty of the distance. |
DeltaX, DeltaY,
DeltaZ |
Numeric | Difference in fractional coordinates (\(x_1 - x_2\)). |
Weight |
Numeric | Calculated bond weight/strength depending on the algorithm. |
crystract is offered under a dual-license model to
accommodate a variety of use cases:
For Open-Source Projects: The package is
licensed under the GNU General Public License v3.0
(GPL-3.0). If you are developing other open-source software,
you are free to use, modify, and distribute crystract under
the terms of the GPL-3.0.
For Commercial Use: If you wish to use
crystract in a commercial product, for commercial services,
or for any other commercial purpose, you must obtain a separate
commercial license. Please contact the package maintainer to arrange the
terms.
Installing crystract involves a few steps, as it is
currently hosted on GitHub. We use the remotes package to
facilitate installation directly from the repository.
Open R or RStudio and run the following commands:
# First, ensure you have the remotes package
install.packages("remotes")
# Install crystract from the GitHub repository
remotes::install_github("PrabhuLab/ml-crystals", subdir = "packages/crystract", build_vignettes = TRUE)To make sure the package was installed correctly, load it into your R session.
library(crystract)If this command runs without any errors, the installation was successful.
For a detailed, step-by-step guide explaining each function, the crystallographic principles, and the formulas used for calculations, please see the package vignette.
You can access it with the following command after you have successfully installed the package:
# This command opens the detailed package guide
vignette("crystract")We welcome and appreciate all forms of community engagement. To ensure a smooth and productive collaboration, we have established guidelines for contributing, reporting issues, and seeking support.
All participants in this project are expected to abide by our Code of Conduct. Please read it to understand the standards of behavior we expect.
For detailed instructions on how to contribute to the software, report bugs, or suggest new features, please review our Contributing Guidelines.
Author and Maintainer: Don Ngo
(dngo@carnegiescience.edu)