TY - JOUR A1 - Al-Abri, Rashid A1 - Gürsoy, Gamze T1 - Estimating the size of long tandem repeat expansions from short reads with ScatTR Y1 - 2025/12/01 JF - Genome Research JO - Genome Research SP - 2701 EP - 2713 DO - 10.1101/gr.280563.125 VL - 35 IS - 12 UR - http://genome.cshlp.org/content/35/12/2701.abstract N2 - Tandem repeats (TRs) are sequences of DNA in which ≥2 bp are repeated back-to-back at specific locations in the genome. TR expansions, in which the number of repeat units exceeds the normal range, have been implicated in more than 50 conditions. However, accurately measuring the copy number of TRs is challenging, especially when their expansions are larger than the fragment sizes used in standard short-read genome sequencing. Here, we introduce ScatTR, a novel computational method that leverages a maximum likelihood framework to estimate the copy number of large TR expansions from short-read sequencing data. ScatTR calculates the likelihood of different alignments between sequencing reads and reference sequences that represent various TR lengths and employs a Monte Carlo technique to find the best match. In simulated data, ScatTR outperforms state-of-the-art methods, particularly for TRs with longer motifs and those with lengths that greatly exceed typical sequencing fragment sizes. When applied to data from the 1000 Genomes Project, ScatTR detects potential large TR expansions that other methods missed, highlighting its ability to better characterize genome-wide TR variation. ER -