Early feature extraction drives model performance in high-resolution chromatin accessibility prediction

  1. Valentina Boeva1,2,4
  1. 1Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland;
  2. 2SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland;
  3. 3Swiss Data Science Center, EPF Lausanne and ETH Zurich, 8092 Zurich, Switzerland;
  4. 4Institut Cochin, Inserm U1016, CNRS UMR 8104, Université Paris Cité, 75014 Paris, France
  • Corresponding authors: ekaterina.krymova{at}sdsc.ethz.ch, valentina.boeva{at}inf.ethz.ch
  • Abstract

    Fine-grained prediction of chromatin accessibility from DNA sequence is a foundational step in modeling gene expression changes resulting from sequence variants. Yet, few methods operate at the resolution necessary to capture subtle effects of single-nucleotide changes. Furthermore, it remains unclear which architectural components, such as residual connections, normalization strategies, or attention mechanisms, drive performance in these high-resolution predictions. To address these knowledge gaps, we systematically evaluate classic architectural choices and introduce ConvNeXt V2 blocks, originally developed for computer vision, as high-resolution feature extractors in deep learning models for genomic data. Integrated into diverse architectures such as convoluted neural networks (CNNs), long short-term memory (LSTM), dilated CNNs, and transformers, ConvNeXt V2 blocks consistently improve performance, leading to similar prediction accuracy across these different model types. This reveals that early feature extraction, rather than downstream architecture, is the primary determinant of prediction accuracy. A comprehensive evaluation of these models on ATAC-seq signal prediction at 4-bp resolution in a cell type–specific manner identifies the ConvNeXt-based dilated CNN as the most robust performer, better preserving the signal’s shape. Our codebase and benchmarks provide practical tools for high-resolution chromatin modeling.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.281042.125.

    • Freely available online through the Genome Research Open Access option.

    • Received June 11, 2025.
    • Accepted December 18, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    OPEN ACCESS ARTICLE

    This Article

    1. Genome Res. © 2026 Grover et al.; Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server