Figure 2.

SMRT sequencing of short CGG repeats. (A) Sequence alignment of representative reads from a library of plasmid-derived FMR1 sequence with nominally 36 CGG repeats. Three major CGG-repeat size species are observed. Flanking and CGG-repeat regions are delineated by vertical tick marks. (B) Frequency of sequence lengths in the top 1000 reads (by predicted quality) plotted by region as indicated. Three major peaks observed in the repeats (red) correspond to 34, 35, and 36 repeats as seen in A. Both the left (green broken line) and right (blue) flanking sequence regions are uniform. (C) Accuracy by alignment to reference of each region of the insert increases with each successive pass of consensus coverage, saturating after four subreads for the flanking regions. Accuracy of the reads within the CGG-repeat region has improved through the use of reference sequences corresponding to the individual lengths within the distribution (see Supplemental Fig. S2).

121fig2