SlimR: Adaptive Machine Learning-Powered, Context-Matching Tool for Single-Cell and Spatial Transcriptomics Annotation

CRAN Package Version CRAN License CRAN Downloads GitHub Package Version GitHub Maintainer

Overview

Sticker

SlimR is an R package for cell-type annotation in single-cell and spatial transcriptomics. Existing marker-based annotation methods typically rely on manually tuned thresholds and operate at a single analytical granularity, limiting their adaptability across diverse datasets. SlimR addresses these challenges through three methodological contributions: (1) a context-matching framework that standardizes heterogeneous marker sources via multi-level biological filtering; (2) a dataset-adaptive parameterization strategy that infers optimal annotation hyperparameters from intrinsic data characteristics, eliminating manual calibration; and (3) a dual-granularity scoring architecture that provides both cluster-level probabilistic assignment and per-cell resolution with manifold-aware spatial smoothing for continuous cell states. A unified Feature Significance Score ensures biologically interpretable marker ranking throughout the workflow.

Table of Contents

  1. Preparation
  2. Standardized Markers_list Input
  3. Automated Annotation Workflow
  4. Semi-Automated Annotation Workflow
  5. Citation
  6. License
  7. Contact

1. Preparation

1.1 Installation

Option One: CRAN CRAN Version

install.packages("SlimR")

Option Two: GitHub GitHub R package version

devtools::install_github("zhaoqing-wang/SlimR")
Dependencies & optional packages

Required: R (≥ 3.5), cowplot, dplyr, ggplot2, patchwork, pheatmap, readxl, scales, Seurat, tidyr, tools

install.packages(c("cowplot", "dplyr", "ggplot2", "patchwork", 
                   "pheatmap", "readxl", "scales", "Seurat", 
                   "tidyr", "tools"))

Optional: RANN (10–100× faster UMAP spatial smoothing in per-cell annotation)

install.packages("RANN")

1.2 Prepare Seurat Object

library(SlimR)

# For Seurat objects with multiple layers, join layers first
sce@assays$RNA <- SeuratObject::JoinLayers(sce@assays$RNA)

Important: Ensure your Seurat object has completed standard preprocessing (normalization, scaling, clustering) and batch effect correction.


2. Standardized Markers_list Input

SlimR uses a standardized list format: list names = cell types, first column = marker genes, additional columns = metrics (optional).

2.1 From Cellmarker2 Database

Reference: Hu et al. (2023) doi:10.1093/nar/gkac947

Cellmarker2 <- SlimR::Cellmarker2

Markers_list_Cellmarker2 <- Markers_filter_Cellmarker2(
  Cellmarker2,
  species = "Human",
  tissue_class = "Intestine",
  tissue_type = NULL,
  cancer_type = NULL,
  cell_type = NULL
)

Important: Specify at least species and tissue_class for accurate annotations.

Optional: Explore database metadata
Cellmarker2_table <- SlimR::Cellmarker2_table
View(Cellmarker2_table)

2.2 From PanglaoDB Database

Reference: Franzén et al. (2019) doi:10.1093/database/baz046

PanglaoDB <- SlimR::PanglaoDB

Markers_list_panglaoDB <- Markers_filter_PanglaoDB(
  PanglaoDB,
  species_input = 'Human',
  organ_input = 'GI tract'
)
Optional: Explore database metadata
PanglaoDB_table <- SlimR::PanglaoDB_table
View(PanglaoDB_table)

2.3 From ScType Database

Reference: Ianevski et al. (2022) doi:10.1038/s41467-022-28803-w

ScType <- SlimR::ScType

Markers_list_ScType <- Markers_filter_ScType(
  ScType,
  tissue_type = "Immune system",
  cell_name = NULL
)

Important: Specify tissue_type for accurate annotations.

Optional: Explore database metadata
ScType_table <- SlimR::ScType_table
View(ScType_table)

2.4 From Seurat Objects

seurat_markers <- Seurat::FindAllMarkers(
    object = sce,
    group.by = "Cell_type",
    only.pos = TRUE)

Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
    sources = "Seurat",
    sort_by = "FSS",
    gene_filter = 20
    )

Tip: sort_by = "FSS" ranks by Feature Significance Score (log2FC × Expression ratio). Use sort_by = "avg_log2FC" for fold-change ranking.

Use presto for ~10× faster marker detection
seurat_markers <- dplyr::filter(
    presto::wilcoxauc(
      X = sce,
      group_by = "Cell_type",
      seurat_assay = "RNA"
      ),
    padj < 0.05, logFC > 0.5
    )

Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
    sources = "presto",
    sort_by = "FSS",
    gene_filter = 20
    )

Install presto: devtools::install_github('immunogenomics/presto')

2.5 From Excel Tables

Format: Each sheet name = cell type, first row = headers, first column = markers, subsequent columns = metrics (optional).

Markers_list_Excel <- Read_excel_markers("D:/Laboratory/Marker_load.xlsx")

If your Excel file lacks column headers, set has_colnames = FALSE.

2.6 Built-in Markers Lists

SlimR includes curated marker lists for specific annotation tasks:

List Scope Reference
Markers_list_scIBD Human intestinal cells (IBD) Nie et al. (2023) doi:10.1038/s43588-023-00464-9
Markers_list_TCellSI T cell subtypes Yang et al. (2024) doi:10.1002/imt2.231
Markers_list_PCTIT Pan-cancer T cell subtypes L. Zheng et al. (2021) doi:10.1126/science.abe6474
Markers_list_PCTAM Pan-cancer macrophage subtypes Ruo-Yu Ma et al. (2022) doi:10.1016/j.it.2022.04.008
# Example: Load built-in markers
Markers_list <- SlimR::Markers_list_scIBD

Important: Ensure your input Seurat object matches the tissue/cell type scope of the selected marker list.


3. Automated Annotation Workflow

SlimR provides two automated approaches: Cluster-Based (one label per cluster, fast) and Per-Cell (individual cell labels, finer resolution). Both share the same parameter calculation step and Markers_list format.

Feature Cluster-Based Per-Cell
Unit Cluster Individual cell
Speed ~10–30s (50k cells) ~2–3min (50k cells)
Resolution Coarse Fine
Best For Homogeneous clusters Mixed clusters, rare cell types
Spatial Context Not used Optional (UMAP smoothing)

3.1 Calculate Parameter

SlimR uses adaptive machine learning to determine optimal min_expression, specificity_weight, and threshold parameters. This step is optional — skip to Section 3.2 to use defaults.

SlimR_params <- Parameter_Calculate(
  seurat_obj = sce,
  features = c("CD3E", "CD4", "CD8A"),
  assay = "RNA",
  cluster_col = "seurat_clusters",
  verbose = TRUE
  )
Custom method: use markers from a specific cell type
SlimR_params <- Parameter_Calculate(
  seurat_obj = sce,
  features = unique(Markers_list_Cellmarker2$`B cell`$marker),
  assay = "RNA",
  cluster_col = "seurat_clusters",
  verbose = TRUE
  )

3.2 Cluster-Based Annotation

Three steps: Calculate → Annotate → Verify.

Step 1: Calculate Cell Types

SlimR_anno_result <- Celltype_Calculate(seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    cluster_col = "seurat_clusters",
    assay = "RNA",
    min_expression = 0.1,
    specificity_weight = 3,
    threshold = 0.6,
    compute_AUC = TRUE,
    plot_AUC = TRUE,
    AUC_correction = FALSE,
    colour_low = "navy",
    colour_high = "firebrick3"
    )

If you ran Parameter_Calculate(), use: min_expression = SlimR_params$min_expression, specificity_weight = SlimR_params$specificity_weight, threshold = SlimR_params$threshold.

View results & correct predictions
# View heatmap, predictions, and ROC curves
print(SlimR_anno_result$Heatmap_plot)
View(SlimR_anno_result$Prediction_results)
print(SlimR_anno_result$AUC_plot)   # Requires plot_AUC = TRUE

# Manually correct predictions
SlimR_anno_result$Prediction_results$Predicted_cell_type[
  SlimR_anno_result$Prediction_results$cluster_col == 15
] <- "Intestinal stem cell"

# Label low-confidence predictions as Unknown
SlimR_anno_result$Prediction_results$Predicted_cell_type[
  SlimR_anno_result$Prediction_results$AUC <= 0.5
] <- "Unknown"

When correcting, preferably use cell types from the Alternative_cell_types column.

Step 2: Annotate Cell Types

sce <- Celltype_Annotation(seurat_obj = sce,
    cluster_col = "seurat_clusters",
    SlimR_anno_result = SlimR_anno_result,
    plot_UMAP = TRUE,
    annotation_col = "Cell_type_SlimR"
    )

Step 3: Verify Cell Types

Celltype_Verification(seurat_obj = sce,
    SlimR_anno_result = SlimR_anno_result,
    gene_number = 5,
    assay = "RNA",
    colour_low = "white",
    colour_high = "navy",
    annotation_col = "Cell_type_SlimR"
    )

Important: Use matching cluster_col and annotation_col values across all three functions.

3.3 Per-Cell Annotation

Three steps: Calculate → Annotate → Verify. Ideal for heterogeneous clusters, rare cell types, and continuous differentiation states.

Step 1: Calculate Per-Cell Types

SlimR_percell_result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    assay = "RNA",
    method = "weighted",
    min_expression = 0.1,
    use_umap_smoothing = FALSE,
    min_score = "auto",
    min_confidence = 1.2,
    verbose = TRUE
    )

Three scoring methods: "weighted" (default, recommended), "mean" (fast baseline), "AUCell" (rank-based, robust to batch effects).

UMAP spatial smoothing & parameter tuning
# Enable UMAP smoothing for noise reduction
SlimR_percell_result <- Celltype_Calculate_PerCell(
    seurat_obj = sce,
    gene_list = Markers_list,
    species = "Human",
    method = "weighted",
    use_umap_smoothing = TRUE,
    k_neighbors = 20,
    smoothing_weight = 0.3
    )

Install RANN for 10–100× faster k-NN: install.packages("RANN")

Scenario min_score min_confidence
Few cell types (<15) "auto" 1.2 (default)
Many cell types (>30) "auto" 1.1–1.15
Strict annotation "auto" 1.3–1.5
Liberal annotation "auto" 1.0 (disable)

Step 2: Annotate Per-Cell Types

sce <- Celltype_Annotation_PerCell(
    seurat_obj = sce,
    SlimR_percell_result = SlimR_percell_result,
    plot_UMAP = TRUE,
    annotation_col = "Cell_type_PerCell_SlimR",
    plot_confidence = TRUE
    )

Step 3: Verify Per-Cell Types

Celltype_Verification_PerCell(
    seurat_obj = sce,
    SlimR_percell_result = SlimR_percell_result,
    gene_number = 5,
    assay = "RNA",
    colour_low = "white",
    colour_high = "navy",
    annotation_col = "Cell_type_PerCell_SlimR",
    min_cells = 10
    )

Important: Use matching annotation_col values in Celltype_Annotation_PerCell() and Celltype_Verification_PerCell().


4. Semi-Automated Annotation Workflow

For expert-guided manual annotation using visualizations:

4.1 Annotation Heat Map
Celltype_Annotation_Heatmap(
  seurat_obj = sce,
  gene_list = Markers_list,
  species = "Human",
  cluster_col = "seurat_cluster",
  min_expression = 0.1,
  specificity_weight = 3,
  colour_low = "navy",
  colour_high = "firebrick3"
)

Note: This function is now incorporated into Celltype_Calculate(). Use Celltype_Calculate() instead for automated workflows.

4.2 Annotation Feature Plots

Generates per-cell-type expression dot plot with metric heat map:

Celltype_Annotation_Features(
  seurat_obj = sce,
  cluster_col = "seurat_clusters",
  gene_list = Markers_list,
  gene_list_type = "Cellmarker2",
  species = "Human",
  save_path = "./SlimR/Celltype_Annotation_Features/",
  colour_low = "white",
  colour_high = "navy",
  colour_low_mertic = "white",
  colour_high_mertic = "navy"
  )

Set gene_list_type to "Cellmarker2", "PanglaoDB", "Seurat", or "Excel" to match your marker source.

4.3 Annotation Combined Plots

Generates per-cell-type box plots of marker expression levels:

Celltype_Annotation_Combined(
  seurat_obj = sce,
  gene_list = Markers_list, 
  species = "Human",
  cluster_col = "seurat_cluster",
  assay = "RNA",
  save_path = "./SlimR/Celltype_Annotation_Combined/",
  colour_low = "white",
  colour_high = "navy"
)

5. Citation

Wang Z (2026). SlimR: Adaptive Machine Learning-Powered, Context-Matching Tool for Single-Cell and Spatial Transcriptomics Annotation.
https://github.com/zhaoqing-wang/SlimR

6. License

MIT

7. Contact

Author: Zhaoqing Wang (ORCID) | Email: zhaoqingwang@mail.sdu.edu.cn | Issues: SlimR Issues