Label-free selection of marker genes in single-cell and spatial transcriptomics with geneCover

    • 1Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218, USA;
    • 2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA;
    • 3Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, Maryland 21218, USA;
    • 4Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA;
    • 5Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland 21218, USA;
    • 6Center for Imaging Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
Published November 12, 2025. Vol 35 Issue 12, pp. 2744-2755. https://doi.org/10.1101/gr.280539.125
Download PDF Please log-in to or register for your personal account in order to access PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 5
Current Issue:

Abstract

The selection of marker gene panels is critical for capturing the cellular and spatial heterogeneity in the expanding atlases of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. Most current approaches to marker gene selection operate in a label-based framework, which is inherently limited by its dependency on predefined cell type labels or clustering results. In contrast, existing label-free methods often struggle to identify genes that characterize rare cell types or subtle spatial patterns, and they frequently fail to scale efficiently with large data sets. Here, we introduce geneCover, a label-free combinatorial method that selects an optimal panel of minimally redundant marker genes based on gene-gene correlations. Our method demonstrates excellent scalability to large data sets and identifies marker gene panels that capture distinct correlation structures across the transcriptome. This allows geneCover to distinguish cell states in various tissues of living organisms effectively, including those associated with rare or otherwise difficult-to-identify cell types. We evaluate the performance of geneCover across various scRNA-seq and spatial transcriptomics data sets, comparing it to other label-free algorithms to highlight its utility and potential in diverse biological contexts.

Loading
Loading
Back to top