Quantifying pathological progression from single-cell transcriptomic data with scPSS

  1. Md Abul Hassan Samee4
  1. 1Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka-1000, Bangladesh;
  2. 2Department of Computer Science, Virginia Tech, Virginia 24061, USA;
  3. 3Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1000, Bangladesh;
  4. 4Department of Integrative Physiology, Baylor College of Medicine, Houston, Texas 77030, USA
  • Corresponding authors: msrahman{at}cse.buet.ac.bd, samee{at}bcm.edu
  • Abstract

    The surge in single-cell data sets and reference atlases has enabled the comparison of cell states across conditions, yet a gap persists in quantifying pathological shifts from healthy cell states. To address this gap, we introduce single-cell Pathological Shift Scoring (scPSS), which provides a statistical measure for how much a “query” cell from a diseased sample has shifted away from a reference group of healthy cells. In scPSS, the distance of a cell to its k-th nearest reference cell is considered as its pathological shift score. Euclidean distances in the top n principal component space of the gene expressions are used to measure distances between cells. The distribution of shift scores of the reference cells forms a null model. This allows a P-value to be assigned to each query cell's shift score, quantifying its statistical significance of being in the reference cell group. This makes our method both simple and statistically rigorous. The key strength of scPSS is its applicability in a “semisupervised” setting, where only healthy reference cells are known and diseased-labeled data are not provided for model training. As existing methods do not support cell-level pathological progression measurement in this setting, we adapt state-of-the-art supervised pathological prediction and contrastive models for benchmarking. Comparative evaluations against these adapted models demonstrate our method's superiority in accuracy and efficiency. Additionally, we show that the aggregation of cell-level pathological scores from scPSS can be used to predict health conditions at the individual level.

    Footnotes

    • Received January 11, 2025.
    • Accepted November 14, 2025.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    This article has not yet been cited by other articles.

    | Table of Contents

    Preprint Server