Kernel-bounded clustering for spatial transcriptomics enables scalable discovery of complex spatial domains

  1. Qiuran Zhao1,2
  1. 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;
  2. 2School of Artificial Intelligence, Nanjing University, Nanjing 210023, China
  1. 3 These authors contributed equally to this work.

  • Corresponding authors: tingkm{at}nju.edu.cn, zhangj_ai{at}nju.edu.cn
  • Abstract

    Spatial transcriptomics are a collection of technologies that have enabled characterization of gene expression profiles and spatial information in tissue samples. Existing methods for clustering spatial transcriptomics data have primarily focused on data transformation techniques to represent the data suitably for subsequent clustering analysis, often using an existing clustering algorithm. These methods have limitations in handling complex data characteristics with varying densities, sizes, and shapes (in the transformed space on which clustering is performed), and they have high computational complexity, resulting in unsatisfactory clustering outcomes and slow execution time even with GPUs. Rather than focusing on data transformation techniques, we propose a new clustering algorithm called kernel-bounded clustering (KBC). It has two unique features: (1) It is the first clustering algorithm that employs a distributional kernel to recruit members of a cluster, enabling clusters of varying densities, sizes, and shapes to be discovered, and (2) it is a linear-time clustering algorithm that significantly enhances the speed of clustering analysis, enabling researchers to effectively handle large-scale spatial transcriptomics data sets. We show that (1) KBC works well with a simple data transformation technique called the Weisfeiler–Lehman scheme, and (2) a combination of KBC and the Weisfeiler–Lehman scheme produces good clustering outcomes, and it is faster and easier-to-use than many methods that employ existing clustering algorithms and data transformation techniques.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278983.124.

    • Freely available online through the Genome Research Open Access option.

    • Received January 17, 2024.
    • Accepted December 19, 2024.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    This article has not yet been cited by other articles.

    OPEN ACCESS ARTICLE

    Preprint Server