A research compendium is a self-contained collection of data, code, and documentation that accompanies a research project. By structuring a project as an R package, you gain:
DESCRIPTION,roxygen2,
vignettes),testthat),SCIproj automates the creation of such a compendium, adding
opinionated defaults for reproducible workflows (targets),
dependency snapshots (renv), and FAIR-compliant metadata
(CITATION.cff).
Install SCIproj from GitHub:
Create a new project with a single call:
This creates a fully scaffolded research compendium with
renv and targets enabled by default.
create_proj("~/projects/baltic_cod",
add_license = "MIT",
license_holder = "Jane Doe",
orcid = "0000-0001-2345-6789",
use_docker = TRUE,
use_git = TRUE
)Directory names with underscores or hyphens are fine — the R package
name in DESCRIPTION is automatically sanitized (e.g.,
baltic_cod becomes baltic.cod).
After creation, the project directory looks like this:
your-project/
├── DESCRIPTION # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd # Top-level project description.
├── your-project.Rproj # RStudio project file.
├── CITATION.cff # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md # Contribution guidelines.
├── LICENSE.md # Full license text (here: MIT).
├── NAMESPACE # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/ # Raw data files and pre-processing scripts.
│ ├── clean_data.R # Script template for data cleaning.
│ ├── DATA_SOURCES.md # Data provenance: source, license, DOI, download date.
│ └── ...
│
├── data/ # Cleaned datasets stored as .rda files.
│
├── R/ # Custom R functions and dataset documentation.
│ ├── function_ex.R # Template for custom functions.
│ ├── data.R # Template for dataset documentation.
│ └── ...
│
├── analyses/ # R scripts or R Markdown/Quarto documents for analyses.
│ ├── figures/ # Generated plots.
│ └── ...
│
├── docs/ # Publication-ready documents (article, report, presentation).
├── trash/ # Temporary files that can be safely deleted.
│
├── _targets.R # Pipeline definition for reproducible workflow.
├── renv/ # renv library and settings.
├── renv.lock # Lockfile for reproducible package versions.
└── Dockerfile # Container definition for full reproducibility.
| Directory / File | Purpose |
|---|---|
R/ |
Reusable R functions (documented with roxygen2) |
data/ |
Cleaned, analysis-ready datasets (.rda format) |
data-raw/ |
Raw data files and the script that cleans them |
analyses/ |
Analysis scripts, R Markdown reports, figures |
docs/ |
Manuscripts, presentations, supplementary material |
trash/ |
Temporary files not under version control |
_targets.R |
Pipeline definition for targets |
CITATION.cff |
Machine-readable citation metadata |
CONTRIBUTING.md |
Guidelines for collaborators |
SCIproj encourages FAIR (Findable, Accessible, Interoperable, Reusable) research practices through several built-in features:
A Citation File Format file is created automatically. It includes the project title, author name, version, release date, and optionally a license and ORCID iD. Services like GitHub and Zenodo can parse this file to generate proper citations.
When data_raw = TRUE (the default), a
DATA_SOURCES.md template is placed in
data-raw/. Use it to document the provenance of every
dataset: source, URL, DOI, license, download date, and file names.
Pass your ORCID iD via the
orcid parameter to embed it in CITATION.cff,
making your authorship unambiguously machine-readable.
By default (use_targets = TRUE), SCIproj adds a
_targets.R pipeline template. The targets package
provides:
_targets/ data store.tar_visnetwork() shows
the pipeline as a graph.A typical workflow:
# 1. Define targets in _targets.R
# 2. Inspect the pipeline
targets::tar_manifest()
targets::tar_visnetwork()
# 3. Run the pipeline
targets::tar_make()
# 4. Read a result
targets::tar_read(my_result)Edit _targets.R to define your data-loading, analysis,
and reporting steps. Each step is a target that depends on upstream
targets and R functions in R/.
By default (use_renv = TRUE), SCIproj initializes renv with the
"explicit" snapshot type. This means renv discovers
dependencies from DESCRIPTION rather than scanning all R
files, which is the recommended approach for package-based
compendia.
Key commands:
renv::status() # check if lockfile is in sync
renv::snapshot() # update the lockfile after adding packages
renv::restore() # reinstall packages from the lockfileThe renv.lock file should be committed to version
control so collaborators can reproduce your exact package versions.
Set use_docker = TRUE to add a Dockerfile
and .dockerignore. The Dockerfile provides a template for
building a container that reproduces your computational environment,
independent of the host system.
Set create_github_repo = TRUE to create a GitHub
repository (requires a configured GITHUB_PAT). Add
ci = "gh-actions" to include a GitHub Actions workflow for
automated R CMD check on push.
Choose from "MIT", "GPL",
"AGPL", "LGPL", "Apache",
"CCBY", or"CC0" via the
add_license parameter. The selected license is applied to
DESCRIPTION and recorded in CITATION.cff.
Set testthat = TRUE to add testing infrastructure
(tests/testthat.R and tests/testthat/).
Writing tests for your analysis functions helps catch regressions
early.
Set makefile = TRUE to add a makefile.R
script as an alternative to targets for orchestrating your
workflow.
Create the project
Open the .Rproj file in
RStudio.
Add raw data to data-raw/ and
document it in DATA_SOURCES.md.
Write cleaning code in
data-raw/clean_data.R; save cleaned data to
data/ with usethis::use_data().
Write analysis functions in R/ and
document them with roxygen2.
Define the pipeline in _targets.R
to connect data, functions, and reports.
Run targets::tar_make() to execute
the pipeline.
Write reports in analyses/ using R
Markdown or Quarto, reading results with
targets::tar_read().
Snapshot dependencies with
renv::snapshot() before sharing.
Push to GitHub and let CI run
R CMD check automatically.