Fast sequence alignment for centromere with RaMA
Abstract
The release of the first draft of the human pangenome has revolutionized genomic research by enabling access to complex regions like centromeres, composed of extra-long tandem repeats (ETRs). However, a significant gap remains as current methodologies are inadequate for producing sequence alignments that effectively capture genetic events within ETRs, highlighting a pressing need for improved alignment tools. Inspired by UniAligner, we develope Rare Match Aligner (RaMA), using rare matches as anchors and 2-piece affine gap cost to generate complete pairwise alignment that better capture genetic evolution. RaMA also employs parallel computing and the wavefront algorithm to accelerate anchor discovery and sequence alignment, achieving up to 13.66 times faster processing and using only 11% of UniAligner's memory. Downstream analysis of simulated data and the CHM13 and CHM1 Higher Order Repeat (HOR) arrays demonstrates that RaMA achieves more accurate alignment, effectively capturing true HOR structures. RaMA also introduces two methods for defining reliable alignment regions, further refining and enhancing the accuracy of centromeric alignment statistics.
- Received July 8, 2024.
- Accepted February 6, 2025.
- Published by Cold Spring Harbor Laboratory Press
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











