Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate
Abstract
DNA conformation may deviate from the classical B-form in ~13% of the human genome. Non-B DNA regulates many cellular processes, including transcription and telomere maintenance; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for informing neurological diseases and cancer genome instability, which are frequently associated with mutations at non-B DNA. Here we present the first genome-wide, simultaneous examination of DNA polymerization kinetics and accuracy in the human genome sequenced with Single-Molecule-Real-Time (SMRT) technology. We show that polymerization speed differs markedly between non-B and B-DNA: for instance, it decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Applying Functional Data Analysis statistical techniques to polymerization kinetics profiles, we predict non-B DNA formation for a novel motif, which we validate experimentally with circular dichroism. We demonstrate that several non-B DNA motifs affect polymerization accuracy (e.g., G-quadruplexes increase sequencing error rates). Moreover, sequencing errors are positively associated with polymerization slowdown, and this relationship is amplified at non-B DNA. Finally, we show that G4 motifs highly divergent between human and orangutan (or having high diversity in the 1,000 Genomes Project data set) have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations. Our results demonstrate how SMRT sequencing data can be used to study polymerization kinetics and accuracy and contribute to our understanding of mutagenesis at non-B DNA.
- Received June 29, 2018.
- Accepted October 30, 2018.
- Published by Cold Spring Harbor Laboratory Press
This manuscript is Open Access.
This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International license), as described at http://creativecommons.org/licenses/by-nc/4.0/.











