
SlimR is an R package for cell-type annotation in single-cell and spatial transcriptomics. Existing marker-based annotation methods typically rely on manually tuned thresholds and operate at a single analytical granularity, limiting their adaptability across diverse datasets. SlimR addresses these challenges through three methodological contributions: (1) a context-matching framework that standardizes heterogeneous marker sources via multi-level biological filtering; (2) a dataset-adaptive parameterization strategy that infers optimal annotation hyperparameters from intrinsic data characteristics, eliminating manual calibration; and (3) a dual-granularity scoring architecture that provides both cluster-level probabilistic assignment and per-cell resolution with manifold-aware spatial smoothing for continuous cell states. A unified Feature Significance Score ensures biologically interpretable marker ranking throughout the workflow.
install.packages("SlimR")devtools::install_github("zhaoqing-wang/SlimR")Required: R (≥ 3.5), cowplot, dplyr, ggplot2, patchwork, pheatmap, readxl, scales, Seurat, tidyr, tools
install.packages(c("cowplot", "dplyr", "ggplot2", "patchwork",
"pheatmap", "readxl", "scales", "Seurat",
"tidyr", "tools"))Optional: RANN (10–100× faster UMAP spatial smoothing in per-cell annotation)
install.packages("RANN")library(SlimR)
# For Seurat objects with multiple layers, join layers first
sce@assays$RNA <- SeuratObject::JoinLayers(sce@assays$RNA)Important: Ensure your Seurat object has completed standard preprocessing (normalization, scaling, clustering) and batch effect correction.
SlimR uses a standardized list format: list names = cell types, first column = marker genes, additional columns = metrics (optional).
Reference: Hu et al. (2023) doi:10.1093/nar/gkac947
Cellmarker2 <- SlimR::Cellmarker2
Markers_list_Cellmarker2 <- Markers_filter_Cellmarker2(
Cellmarker2,
species = "Human",
tissue_class = "Intestine",
tissue_type = NULL,
cancer_type = NULL,
cell_type = NULL
)Important: Specify at least species and
tissue_class for accurate annotations.
Cellmarker2_table <- SlimR::Cellmarker2_table
View(Cellmarker2_table)Reference: Franzén et al. (2019) doi:10.1093/database/baz046
PanglaoDB <- SlimR::PanglaoDB
Markers_list_panglaoDB <- Markers_filter_PanglaoDB(
PanglaoDB,
species_input = 'Human',
organ_input = 'GI tract'
)PanglaoDB_table <- SlimR::PanglaoDB_table
View(PanglaoDB_table)Reference: Ianevski et al. (2022) doi:10.1038/s41467-022-28803-w
ScType <- SlimR::ScType
Markers_list_ScType <- Markers_filter_ScType(
ScType,
tissue_type = "Immune system",
cell_name = NULL
)Important: Specify tissue_type for accurate
annotations.
ScType_table <- SlimR::ScType_table
View(ScType_table)seurat_markers <- Seurat::FindAllMarkers(
object = sce,
group.by = "Cell_type",
only.pos = TRUE)
Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
sources = "Seurat",
sort_by = "FSS",
gene_filter = 20
)Tip: sort_by = "FSS" ranks by Feature Significance
Score (log2FC × Expression ratio). Use
sort_by = "avg_log2FC" for fold-change ranking.
seurat_markers <- dplyr::filter(
presto::wilcoxauc(
X = sce,
group_by = "Cell_type",
seurat_assay = "RNA"
),
padj < 0.05, logFC > 0.5
)
Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
sources = "presto",
sort_by = "FSS",
gene_filter = 20
)Install presto:
devtools::install_github('immunogenomics/presto')
Format: Each sheet name = cell type, first row = headers, first column = markers, subsequent columns = metrics (optional).
Markers_list_Excel <- Read_excel_markers("D:/Laboratory/Marker_load.xlsx")If your Excel file lacks column headers, set
has_colnames = FALSE.
SlimR includes curated marker lists for specific annotation tasks:
| List | Scope | Reference |
|---|---|---|
Markers_list_scIBD |
Human intestinal cells (IBD) | Nie et al. (2023) doi:10.1038/s43588-023-00464-9 |
Markers_list_TCellSI |
T cell subtypes | Yang et al. (2024) doi:10.1002/imt2.231 |
Markers_list_PCTIT |
Pan-cancer T cell subtypes | L. Zheng et al. (2021) doi:10.1126/science.abe6474 |
Markers_list_PCTAM |
Pan-cancer macrophage subtypes | Ruo-Yu Ma et al. (2022) doi:10.1016/j.it.2022.04.008 |
# Example: Load built-in markers
Markers_list <- SlimR::Markers_list_scIBDImportant: Ensure your input Seurat object matches the tissue/cell type scope of the selected marker list.
SlimR provides two automated approaches:
Cluster-Based (one label per cluster, fast) and
Per-Cell (individual cell labels, finer resolution).
Both share the same parameter calculation step and
Markers_list format.
| Feature | Cluster-Based | Per-Cell |
|---|---|---|
| Unit | Cluster | Individual cell |
| Speed | ~10–30s (50k cells) | ~2–3min (50k cells) |
| Resolution | Coarse | Fine |
| Best For | Homogeneous clusters | Mixed clusters, rare cell types |
| Spatial Context | Not used | Optional (UMAP smoothing) |
SlimR uses adaptive machine learning to determine optimal
min_expression, specificity_weight, and
threshold parameters. This step is optional — skip
to Section 3.2 to use defaults.
SlimR_params <- Parameter_Calculate(
seurat_obj = sce,
features = c("CD3E", "CD4", "CD8A"),
assay = "RNA",
cluster_col = "seurat_clusters",
verbose = TRUE
)SlimR_params <- Parameter_Calculate(
seurat_obj = sce,
features = unique(Markers_list_Cellmarker2$`B cell`$marker),
assay = "RNA",
cluster_col = "seurat_clusters",
verbose = TRUE
)Three steps: Calculate → Annotate → Verify.
Step 1: Calculate Cell Types
SlimR_anno_result <- Celltype_Calculate(seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
min_expression = 0.1,
specificity_weight = 3,
threshold = 0.6,
compute_AUC = TRUE,
plot_AUC = TRUE,
AUC_correction = FALSE,
colour_low = "navy",
colour_high = "firebrick3"
)If you ran Parameter_Calculate(), use:
min_expression = SlimR_params$min_expression,
specificity_weight = SlimR_params$specificity_weight,
threshold = SlimR_params$threshold.
# View heatmap, predictions, and ROC curves
print(SlimR_anno_result$Heatmap_plot)
View(SlimR_anno_result$Prediction_results)
print(SlimR_anno_result$AUC_plot) # Requires plot_AUC = TRUE
# Manually correct predictions
SlimR_anno_result$Prediction_results$Predicted_cell_type[
SlimR_anno_result$Prediction_results$cluster_col == 15
] <- "Intestinal stem cell"
# Label low-confidence predictions as Unknown
SlimR_anno_result$Prediction_results$Predicted_cell_type[
SlimR_anno_result$Prediction_results$AUC <= 0.5
] <- "Unknown"When correcting, preferably use cell types from the
Alternative_cell_types column.
Step 2: Annotate Cell Types
sce <- Celltype_Annotation(seurat_obj = sce,
cluster_col = "seurat_clusters",
SlimR_anno_result = SlimR_anno_result,
plot_UMAP = TRUE,
annotation_col = "Cell_type_SlimR"
)Step 3: Verify Cell Types
Celltype_Verification(seurat_obj = sce,
SlimR_anno_result = SlimR_anno_result,
gene_number = 5,
assay = "RNA",
colour_low = "white",
colour_high = "navy",
annotation_col = "Cell_type_SlimR"
)Important: Use matching cluster_col and
annotation_col values across all three
functions.
Three steps: Calculate → Annotate → Verify. Ideal for heterogeneous clusters, rare cell types, and continuous differentiation states.
Step 1: Calculate Per-Cell Types
SlimR_percell_result <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
assay = "RNA",
method = "weighted",
min_expression = 0.1,
use_umap_smoothing = FALSE,
min_score = "auto",
min_confidence = 1.2,
verbose = TRUE
)Three scoring methods: "weighted" (default,
recommended), "mean" (fast baseline), "AUCell"
(rank-based, robust to batch effects).
# Enable UMAP smoothing for noise reduction
SlimR_percell_result <- Celltype_Calculate_PerCell(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
method = "weighted",
use_umap_smoothing = TRUE,
k_neighbors = 20,
smoothing_weight = 0.3
)Install RANN for 10–100× faster k-NN:
install.packages("RANN")
| Scenario | min_score |
min_confidence |
|---|---|---|
| Few cell types (<15) | "auto" |
1.2 (default) |
| Many cell types (>30) | "auto" |
1.1–1.15 |
| Strict annotation | "auto" |
1.3–1.5 |
| Liberal annotation | "auto" |
1.0 (disable) |
Step 2: Annotate Per-Cell Types
sce <- Celltype_Annotation_PerCell(
seurat_obj = sce,
SlimR_percell_result = SlimR_percell_result,
plot_UMAP = TRUE,
annotation_col = "Cell_type_PerCell_SlimR",
plot_confidence = TRUE
)Step 3: Verify Per-Cell Types
Celltype_Verification_PerCell(
seurat_obj = sce,
SlimR_percell_result = SlimR_percell_result,
gene_number = 5,
assay = "RNA",
colour_low = "white",
colour_high = "navy",
annotation_col = "Cell_type_PerCell_SlimR",
min_cells = 10
)Important: Use matching annotation_col values in
Celltype_Annotation_PerCell() and
Celltype_Verification_PerCell().
For expert-guided manual annotation using visualizations:
Celltype_Annotation_Heatmap(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_cluster",
min_expression = 0.1,
specificity_weight = 3,
colour_low = "navy",
colour_high = "firebrick3"
)Note: This function is now incorporated into
Celltype_Calculate(). Use Celltype_Calculate()
instead for automated workflows.
Generates per-cell-type expression dot plot with metric heat map:
Celltype_Annotation_Features(
seurat_obj = sce,
cluster_col = "seurat_clusters",
gene_list = Markers_list,
gene_list_type = "Cellmarker2",
species = "Human",
save_path = "./SlimR/Celltype_Annotation_Features/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)Set gene_list_type to "Cellmarker2",
"PanglaoDB", "Seurat", or "Excel"
to match your marker source.
Generates per-cell-type box plots of marker expression levels:
Celltype_Annotation_Combined(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_Annotation_Combined/",
colour_low = "white",
colour_high = "navy"
)Wang Z (2026). SlimR: Adaptive Machine Learning-Powered, Context-Matching Tool for Single-Cell and Spatial Transcriptomics Annotation.
https://github.com/zhaoqing-wang/SlimR
Author: Zhaoqing Wang (ORCID) | Email: zhaoqingwang@mail.sdu.edu.cn | Issues: SlimR Issues