Figure 2.

DNA polymerase kinetics in SMRT sequencing is a function of the local sequence context of the incorporation site, motivating a conditional random field approach to KVE detection. (A) Heatmap of the coefficient of determination (R2) for the IPD variance for the incorporation site of a SMRT sequencing reaction explained by local sequence context. This heatmap suggests that seven bases upstream of and two bases downstream from the incorporation site are the most informative, and that bases beyond this context do not provide much additional information about the enzyme kinetics. (B) Scatter plot comparing IPDs in identical sequence contexts between whole-genome amplified E. coli and M. genitalium samples. Each point represents the log of the IPD for a given 10-bp context (seven bases upstream of and two bases downstream from the incorporation site) in E. coli (y-axis) and M. genitalium (x-axis): 2500 points sampled from the 1,048,576 possible 10-mer contexts are shown here for ease of viewing. The strong correlation (Pearson's correlation coefficient = 0.91) between IPDs in identical contexts assayed from completely independent sequencing runs of different species demonstrate that the context effects are highly consistent between experiments. (C) Graphical representation of the CRF model. The 129inf1 variables represent the hidden modification states for site i, while the 129inf2 represent the observed IPD values for site i that inform on the modification status of the site. In this model we are considering interactions between the incorporation site, 129inf3, and the two nearest neighboring sites on each side of 129inf4. The edges between the 129inf5 variables indicate there can be interactions between the local sites, with the 129inf6 parameters representing the degree of interaction among the nodes. The 129inf7 parameters represent the exponential rates for the two possible rate classes at each position i (129inf8), while the 129inf9 parameters represent the proportion of molecules in state k at position i (with 129inf10).

129fig2