
An R package for the initialization and organization of a scientific project following reproducible research and FAIR principles.
SCIproj is an R package that allows users to initialize a project
through its function create_proj() and manage a scientific
project as an R package or a research compendium. This combines
structure, where files are located, and
workflow, how analyses are reproduced or
replicated.
The package is built on modern reproducibility standards and guidelines such as:
The package has some default settings to ensure reproducibility. These include:
your-project/
├── DESCRIPTION # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd # Top-level project description.
├── your-project.Rproj # RStudio project file.
├── CITATION.cff # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md # Contribution guidelines.
├── LICENSE.md # Full license text (optional, requires add_license).
├── NAMESPACE # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/ # Raw data files and pre-processing scripts.
│ ├── clean_data.R # Script template for data cleaning.
│ ├── DATA_SOURCES.md # Data provenance: source, license, DOI, download date.
│ └── ...
│
├── data/ # Cleaned datasets stored as .rda files.
│
├── R/ # Custom R functions and dataset documentation.
│ ├── function_ex.R # Template for custom functions.
│ ├── data.R # Template for dataset documentation.
│ └── ...
│
├── analyses/ # R scripts or R Markdown/Quarto documents for analyses.
│ ├── figures/ # Generated plots.
│ └── ...
│
├── docs/ # Publication-ready documents (article, report, presentation).
├── trash/ # Temporary files that can be safely deleted.
│
├── _targets.R # Pipeline definition for reproducible workflow (default).
├── renv/ # renv library and settings (default).
├── renv.lock # Lockfile for reproducible package versions (default).
└── Dockerfile # Container definition for full reproducibility (optional).
targets
pipeline tracks dependencies automatically - only re-run what
changed.renv ensures the
exact same package versions are used everywhere.CITATION.cff makes
the project machine-readable and citable, DATA_SOURCES.md
documents data provenance.devtools::load_all()
instantly makes all clean datasets and custom functions available.Install the development version from GitHub:
### Using remotes
# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")
### Or better: using the new pak package
# install.packages("pak")
pak::pkg_install("saskiaotto/SCIproj")library("SCIproj")
create_proj("my_research_project")This creates a project with renv, targets,
CITATION.cff, and DATA_SOURCES.md by
default.
Customize with parameters:
### Full-featured project with GitHub, CI, and ORCID
create_proj("my_research_project",
add_license = "MIT",
license_holder = "Jane Doe",
orcid = "0000-0001-2345-67893",
create_github_repo = TRUE,
ci = "gh-actions"
)
### Minimal project without workflow tools
create_proj("my_research_project",
use_renv = FALSE,
use_targets = FALSE
)| Parameter | Default | Description |
|---|---|---|
data_raw |
TRUE |
Add data-raw/ folder with
templates |
makefile |
FALSE |
Add makefile.R template |
testthat |
FALSE |
Add testthat infrastructure |
use_pipe |
FALSE |
Add magrittr pipe (native
\|> recommended) |
add_license |
NULL |
License type: "MIT",
"GPL", "Apache", etc. |
license_holder |
"Your name" |
License holder / project author |
orcid |
NULL |
ORCID iD for CITATION.cff |
use_git |
TRUE |
Initialize local git repo |
create_github_repo |
FALSE |
Create GitHub repo (needs
GITHUB_PAT) |
ci |
"none" |
CI type: "none" or
"gh-actions" |
use_renv |
TRUE |
Initialize renv for dependency management |
use_targets |
TRUE |
Add _targets.R pipeline
template |
use_docker |
FALSE |
Add Dockerfile template |
open_proj |
FALSE |
Open new project in RStudio |
Create the project with create_proj().
Edit DESCRIPTION with project metadata: title,
summary, contributors (with ORCID), license, dependencies.
Edit README.Rmd with project details: objectives,
timeline, workflow.
Document your data provenance in
data-raw/DATA_SOURCES.md: source, license, download date,
DOI for each dataset.
Place original (raw) data in data-raw/. Use
clean_data.R (or more scripts) for pre-processing. Store
clean datasets with usethis::use_data().
Document clean datasets using roxygen in R/ (see
template data.R). For details, see Documenting
data.
Place custom functions in R/ with roxygen
documentation. See the documentation chapter in the
R Packages book.
Write tests for your functions in tests/ (set
testthat = TRUE in create_proj()). See Testing
basics.
Place analysis scripts/notebooks in analyses/. Save
plots in analyses/figures/.
Place final manuscripts, reports, and presentations in
docs/. Use R Markdown, Quarto, or templates from rticles, thesisdown, or Quarto
journal extensions.
Keep dependencies in sync: usethis::use_package()
for DESCRIPTION, renv::snapshot() for the
lockfile.
Update CITATION.cff when you
archive your project or publish.
devtools::load_all()
or Ctrl/Cmd + Shift + L in RStudio.devtools::document() or Ctrl/Cmd + Shift +
D.devtools::test() or
Ctrl/Cmd + Shift + T.targets::tar_make()
to execute all targets. targets::tar_visnetwork() to
visualize dependencies.renv::snapshot()
after installing or updating packages.For a detailed introduction to targets, see the user manual.
For maximum reproducibility, consider also using Docker
(use_docker = TRUE). See the Rocker Project for R-specific
Docker images.
When your project is finalized:
CITATION.cff with the DOI.codemeta.json with
codemetar::write_codemeta() for richer metadata.