TY - JOUR A1 - Jha, Anupama A1 - Bohaczuk, Stephanie C. A1 - Mao, Yizi A1 - Ranchalis, Jane A1 - Mallory, Benjamin J. A1 - Min, Alan T. A1 - Hamm, Morgan O. A1 - Swanson, Elliott A1 - Dubocanin, Danilo A1 - Finkbeiner, Connor A1 - Li, Tony A1 - Whittington, Dale A1 - Noble, William Stafford A1 - Stergachis, Andrew B. A1 - Vollger, Mitchell R. T1 - DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools Y1 - 2024/11/01 JF - Genome Research JO - Genome Research SP - 1976 EP - 1986 DO - 10.1101/gr.279095.124 VL - 34 IS - 11 UR - http://genome.cshlp.org/content/34/11/1976.abstract N2 - Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA N6-methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions. ER -