SMRT sequencing of a mid-premutation CGG-repeat expansion (approximately 95 CGG repeats). (A) Sequence alignment of representative CCS reads from a library of plasmid-generated FMR1 sequence; note that the original construct was generated from PCR-amplified genomic DNA followed by bacterial clonal selection. Sporadic single-base additions and deletions result from comparatively lower CCS coverage than the smaller repeat library; upper and lower sets represent sample CCS reads from the main peak at ∼280 nucleotides and from the smaller, broad distribution, respectively; horizontal lines indicate the CGG-repeat regions. (B) Expanded view of the transition from flanking sequence into the CGG repeats. An AGG repeat (boxed) is unambiguously recognized in all reads, demonstrating the utility in genotyping polymorphic CGG-repeat interruptions. (C) Frequency distribution of sequence lengths in the top 1000 reads plotted by region. A major peak is observed in the repeats (red), with minor peaks generally corresponding to units of single repeats, and a spread of shorter fragments produced by bacterial deletion of the CGG repeats. Both left (green broken line) and right (blue) flanking sequence regions are uniform.
