RECOMBINE identifies recurrent composite markers of cell types and states

    • Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
Download PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 5
Current Issue:

Abstract

Biological function is mediated by the hierarchical organization of cell types and states within tissue ecosystems. Identifying interpretable composite marker sets that both define and distinguish hierarchical cell identities is essential for decoding biological complexity yet remains a major challenge. Here, we present RECOMBINE, an algorithm that identifies recurrent composite marker sets to define hierarchical cell identities. Validation using both simulated and biological data sets demonstrates that RECOMBINE is robust to hyperparameter variation and data sparsity, and achieves higher accuracy in identifying discriminant markers compared with existing approaches. As a partition-free framework, RECOMBINE is particularly powerful for data sets characterized by continuous cell-state transitions, in which defining discrete boundaries is inappropriate. This capability is demonstrated by its application to zebrafish development, revealing gradual transcriptional transitions across embryonic stages, and to the mouse cerebellum, in which it uncovers transcriptional variation shaped by spatial gradients. When applied to single-cell data and validated with spatial transcriptomic data from the mouse visual cortex, RECOMBINE identifies key cell-type markers and generates a robust gene panel for targeted spatial profiling. It also uncovers markers of CD8+ T cell states, including GZMK+HAVCR2 effector memory cells associated with anti-PD-1 therapy response. Finally, using data from the Tabula Sapiens project, RECOMBINE identifies composite marker sets across a broad range of human tissues. Together, these results highlight RECOMBINE as a robust, data-driven framework for optimized marker selection, enabling the discovery and validation of hierarchical cell identities across diverse tissue contexts.

Loading
Loading
Loading
Back to top