Abstract
Cell-type deconvolution has been instrumental for the analysis of spatial transcriptomics (ST) data to reveal underlying tissue heterogeneity. Although reference-based methods have been widely explored, practical limitations, particularly the need for matched single-cell RNA-seq data sets, highlight the value of robust reference-free methods. Existing reference-free approaches, such as STdeconvolve, overlook spatial information, despite the well-established observation that spatially adjacent spots often share similar cellular compositions. Motivated by this, we propose SpatialCD, a spatially informed reference-free deconvolution method that extends Latent Dirichlet Allocation (LDA) with spatial regularization to encourage neighboring spots to exhibit similar cell-type structures. SpatialCD produces improved estimates of cell-type proportions and gene expression profiles. Across simulated and real data sets, including MERFISH-derived simulations, mouse olfactory bulb (MOB), 10× Visium, and DBiT-seq data, SpatialCD consistently improves performance over existing reference-free methods across evaluated data sets by recovering more accurate transcriptional patterns and revealing biologically coherent spatial organization across normal and diseased tissues, including subtle anatomical layers and region-specific tumor-associated cell populations. This work advances statistical tools for spatial transcriptomics and enriches the methodological toolkit for complex spatial gene expression analysis.