Robust and efficient annotation of cell states through gene signature scoring

  1. Valentina Boeva1,4,6,7
  1. 1ETH Zurich, Department of Computer Science, Institute for Machine Learning, 8092 Zurich, Switzerland;
  2. 2Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany;
  3. 3Hector Fellow Academy, 76131 Karlsruhe, Germany;
  4. 4SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland;
  5. 5University Hospital Zurich, Department of Thoracic Surgery, 8092 Zurich, Switzerland;
  6. 6ETH AI Center, ETH Zürich, 8092 Zurich, Switzerland;
  7. 7Institut Cochin, Inserm U1016, CNRS UMR 8104, Université Paris Cité, 75014 Paris, France
  1. 8 These authors contributed equally to this work.

  • Present addresses: 9Medical University of Vienna, Institute of Artificial Intelligence, Center for Medical Data Science (CEDAS), 1090 Vienna, Austria; 10CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria; 11Eric and Wendy Schmidt Center, The Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA

  • Corresponding author: valentina.boeva{at}inf.ethz.ch
  • Abstract

    Gene signature scoring is integral to single-cell RNA sequencing (scRNA-seq) data analysis, particularly for unsupervised cellular state annotation based on maximum signature score values. However, this application requires robust and comparable score distributions across diverse signatures and experimental conditions. Our systematic evaluation of established scoring methodologies—Seurat, SCANPY, UCell, and JASMINE—across nine healthy and cancer scRNA-seq data sets demonstrates their insufficiency in fulfilling this requirement. To address this limitation, we present Adjusted Neighborhood Scoring (ANS), a deterministic algorithm with enhanced control gene selection that significantly improves score stability and cross-signature comparability, achieving cell-state annotation accuracy comparable to supervised methods. We demonstrate the practical utility of ANS by developing and validating a gene signature to differentiate cancer-associated fibroblasts from malignant cells undergoing epithelial-to-mesenchymal transition. Overall, ANS provides a robust and reliable gene signature scoring framework, significantly improving the accuracy of score-based annotation of cell types and states in single-cell studies.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280926.125.

    • Freely available online through the Genome Research Open Access option.

    • Received May 14, 2025.
    • Accepted January 16, 2026.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server