Deconvolution with Iterative Completion for Estimating Cellular Proportions from RNA-seq Data
Bulk RNA-seq deconvolution infers the proportions of distinct cell types from a mixed gene expression profile. Most existing methods assume that the reference signature matrix is complete – i.e., that every cell population present in the bulk sample is represented. In practice this assumption rarely holds, leading to biased estimates.
dicepro addresses this limitation through an iterative joint optimization that simultaneously:
Hyper-parameters \((\lambda, \gamma, p')\) controlling the NMF regularization are selected automatically via a Pareto-frontier + knee-point procedure, so no manual tuning is required.
BlueCode
(34-cell-type reference) and CellMixtures (12
experimentally mixed bulk samples) included.output_path/report/.# install.packages("remotes")
remotes::install_github("kalidouBA/dicepro")library(dicepro)
set.seed(2101)
# 1. Simulate reference, proportions, and noisy bulk
sim <- simulation(
scenario = "hierarchical",
nSample = 30,
nGenes = 200,
nCellsType = 10,
sigma_bio = 0.07,
sigma_tech = 0.07
)
# 2. Run dicepro
out <- dicepro(
reference = as.matrix(sim$W)[, -c(1,5,10)],
bulk = as.matrix(sim$B),
methodDeconv = "FARDEEP",
bulkName = "SimBulk",
refName = "SimRef",
hp_max_evals = 100L,
hspaceTechniqueChoose = "all",
output_path = tempdir()
)
# 3. Inspect results
class(out)
out$hyperparameters # best lambda / gamma
head(out$H) # estimated proportions
out$plot # interactive Pareto plot
out$plot_hyperopt # hyper-parameter scatter matrixlibrary(dicepro)
data(BlueCode) # 34-cell-type reference (G x 34)
data(CellMixtures) # 12 mixed bulk samples (G x 12)
out <- dicepro(
reference = BlueCode,
bulk = CellMixtures,
methodDeconv = "FARDEEP",
bulkName = "CellMixtures",
refName = "BlueCode",
hp_max_evals = 100L,
hspaceTechniqueChoose = "all",
output_path = tempdir()
)
head(out$H)CIBERSORTx (methodDeconv = "CSx") requires Docker and a
personal token.
Step 1 – Install Docker Desktop
Download from https://www.docker.com/products/docker-desktop/, open it, log in, then pull the CIBERSORTx image from a terminal:
docker pull cibersortx/fractionsStep 2 – Obtain a token
Request a token at https://cibersortx.stanford.edu/ (you will first need to register). Tokens are tied to your account and expire periodically; request a new one when the existing token has expired.
Step 3 – Run dicepro with CIBERSORTx
out <- dicepro(
reference = BlueCode,
bulk = CellMixtures,
methodDeconv = "CSx",
cibersortx_email = "your@email.com",
cibersortx_token = "your_token_here",
bulkName = "CellMixtures",
refName = "BlueCode",
output_path = tempdir()
)Other supported deconvolution backends can be listed with
?running_method.
dicepro() returns an S3 object of class
"dicepro" with the following elements:
| Element | Description |
|---|---|
$hyperparameters |
Best \(\lambda\) and \(\gamma\) found by the search |
$metrics |
Loss and constraint value at the optimum |
$trials |
data.frame of all evaluated hyper-parameter configurations |
$W |
Optimized reference matrix (including unknown cell types) |
$H |
Estimated cell-type proportions (samples x cell types) |
$plot |
Pareto frontier |
$plot_hyperopt |
Hyper-parameter scatter matrix (ggplot2) |
A 13299 genes x 34 cell-type reference signature matrix derived from sorted bulk RNA-seq profiles spanning five major tissue compartments: Immune (9), Stromal (8), Endothelial (3), Epithelial (5), and Muscle (9). Immune cells (e.g., B cells, T cells, monocytes, macrophages, NK cells) Stromal cells (e.g., fibroblasts, MSC-like cells) Endothelial cells Epithelial cells Muscle-related cells (e.g., smooth muscle, myocytes)
data(BlueCode)
dim(BlueCode)
colnames(BlueCode)A 31422 genes x 12 bulk RNA-seq matrix of experimentally constructed cell mixtures (samples A–L), paired with BlueCode for benchmarking.
data(CellMixtures)
dim(CellMixtures)
colnames(CellMixtures)See ?BlueCode and ?CellMixtures for full
documentation.
Two vignettes provide step-by-step:
vignette("vignette-simulation", package = "dicepro")
vignette("vignette-real-data", package = "dicepro")If you use dicepro in your research, please cite: When Less
Is Not More: dicepro Mitigates the Impact of Incomplete
Reference Matrices on Cellular Frequency Deconvolution.
Bioinformatics. doi:10.64898/2026.06.17.732876