Label-free selection of marker genes in single-cell and spatial transcriptomics with geneCover

  1. Laurent Younes1,6
  1. 1Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218, USA;
  2. 2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA;
  3. 3Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, Maryland 21218, USA;
  4. 4Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA;
  5. 5Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland 21218, USA;
  6. 6Center for Imaging Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
  • Corresponding author: awang87{at}jhu.edu
  • Abstract

    The selection of marker gene panels is critical for capturing the cellular and spatial heterogeneity in the expanding atlases of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. Most current approaches to marker gene selection operate in a label-based framework, which is inherently limited by its dependency on predefined cell type labels or clustering results. In contrast, existing label-free methods often struggle to identify genes that characterize rare cell types or subtle spatial patterns, and they frequently fail to scale efficiently with large data sets. Here, we introduce geneCover, a label-free combinatorial method that selects an optimal panel of minimally redundant marker genes based on gene-gene correlations. Our method demonstrates excellent scalability to large data sets and identifies marker gene panels that capture distinct correlation structures across the transcriptome. This allows geneCover to distinguish cell states in various tissues of living organisms effectively, including those associated with rare or otherwise difficult-to-identify cell types. We evaluate the performance of geneCover across various scRNA-seq and spatial transcriptomics data sets, comparing it to other label-free algorithms to highlight its utility and potential in diverse biological contexts.

    Footnotes

    • Received February 15, 2025.
    • Accepted August 25, 2025.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    Preprint Server