spRefine denoises and imputes spatial transcriptomics with a reference-free framework powered by genomic language model

  1. Hongyu Zhao1,3
  1. 1 Yale University;
  2. 2 Northeastern University, Broad Institute
  • * Corresponding author; email: hongyu.zhao{at}yale.edu
  • Abstract

    The analysis of spatial transcriptomics is hindered by high noise levels and missing gene measurements, challenges that are further compounded by the higher cost of spatial data compared to traditional single-cell data. To overcome this challenge, we introduce spRefine, a deep learning framework that leverages genomic language models to jointly denoise and impute spatial transcriptomic data. Our results demonstrate that spRefine yields more robust cell- and spot-level representations after denoising and imputation, substantially improving data integration. In addition, spRefine serves as a strong framework for model pretraining and the discovery of novel biological signals, as highlighted by multiple downstream applications across datasets of varying scales. Notably, spRefine enhances the accuracy of spatial ageing clock estimations and uncovers new aging-related relationships associated with key biological processes, such as neuronal function loss, which offers new insights for analyzing ageing effect with spatial transcriptomics.

    • Received June 2, 2025.
    • Accepted January 21, 2026.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International license), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.281001.125 Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server