Rashid Al-Abri; Gamze Gürsoy

Figure 1.

Overview of the ScatTR method. (A) Reads likely originating from the TR locus are collected from reference-aligned WGS data that include both mapped and unmapped reads to form the “bag of reads.” Additionally, the expected read depth and insert size distributions are extracted. These distributions are used as parameters in the likelihood function. TR copy number estimation is done with bootstrapping. ScatTR iteratively finds the copy number that minimizes a cost function using golden section search (GSS), and the result from each iteration is used to update a distribution of copy number estimates. After multiple iterations, this distribution is used to report a final estimate and a 95% confidence interval. (B) For a given copy number, ScatTR evaluates the cost by finding the best alignment to a decoy reference with the given number of repeat units. It starts with an initial alignment and then updates the alignment using Monte Carlo moves to reduce the cost and accept changes based on a probability function. This process is done via simulated annealing. It continues until convergence, yielding the best alignment for the given TR copy number. The best alignment is used to calculate the cost of the copy number, which is what GSS minimizes in A.

Estimating the size of long tandem repeat expansions from short reads with ScatTR

This Article

Preprint Server

Current Issue

In This Issue