EnDeep4mC predicts DNA N4-methylcytosine sites using a dual-adaptive feature encoding framework in deep ensembles

  1. Ximei Luo1,2
  1. 1Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, Sichuan, China;
  2. 2Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, Zhejiang, China;
  3. 3College of Biomedical Engineering, Sichuan University, Chengdu 610041, Sichuan, China
  • Corresponding author: luoximei{at}uestc.edu.cn
  • Abstract

    DNA N4-methylcytosine (4mC), a key epigenetic modification regulating DNA repair and replication, requires efficient computational detection methods due to experimental limitations. Although machine learning predictors have been proposed, their performance could be enhanced through systematic optimization of feature encoding schemes. Here, we propose EnDeep4mC, a dual-adaptive framework integrating species-specific modeling with ensemble deep learning architectures to systematically optimize feature encoding schemes. Evaluated across six species, EnDeep4mC demonstrates commendable prediction performance and significantly outperforms current state-of-the-art predictors. Cross-species validation confirms its robust transferability from animal to microbe groups. Evolutionary analysis further uncovers the functional differentiation of 4mC sequences in biological evolution: Prokaryotic 4mC relies on stable patterns, whereas eukaryotes achieve regulatory plasticity through dynamic sequence combinations, which provides experimental evidence for species-adaptive encoding strategies.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280977.125.

    • Freely available online through the Genome Research Open Access option.

    • Received June 8, 2025.
    • Accepted January 8, 2026.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    This article has not yet been cited by other articles.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server