Enabling efficient and robust analysis of tandem repeats in genomic data using wavefront-based string decomposer

  1. Guojun Li1,3
  1. 1 Shandong University;
  2. 2 The Hong Kong University of Science and Technology
  • * Corresponding author; email: guojunsdu{at}gmail.com
  • Abstract

    Tandem repeats (TRs) analysis is crucial for understanding genome structure and variation. However, string decomposition, a key challenge in TRs analysis, remains computationally demanding. In this study, we introduce Wavefront-based String Decomposer (WSD), a novel algorithm that enhances efficiency and accuracy in TRs decomposition. By integrating wavefront techniques, WSD significantly reduces computational and memory costs. Additionally, two adaptive strategies minimize parameter sensitivity and further improve efficiency. Through extensive experiments, we demonstrate that WSD outperforms current state-of-the-art (SOTA) methods, achieving an average speedup of ~ 2.33× and reducing memory usage by two orders of magnitude when analyzing human TRs.

    • Received August 22, 2025.
    • Accepted March 29, 2026.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.281346.125 Published by Cold Spring Harbor Laboratory Press

    Article Category

    Share

    Preprint Server