A scalable computational framework for predicting gene expression from candidate cis-regulatory elements

  1. De-Shuang Huang1,5
  1. 1Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo 315201, China;
  2. 2Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning, 530007, China;
  3. 3School of Information Engineering, Xuzhou University of Technology, Xuzhou 221018, China;
  4. 4Faculty of Data Science, City University of Macau, Macau 999078, China;
  5. 5Institute for Regenerative Medicine, Shanghai East Hospital, Tongji University, Shanghai 200092, China
  • Corresponding author: dshuang{at}eitech.edu.cn
  • Abstract

    Deciphering the relationships between cis-regulatory elements (CREs) and target gene expression has been a long-standing unsolved problem in molecular biology, and the dynamics of CREs in different cell types make this problem more challenging. To address this challenge, we propose a scalable computational framework for predicting gene expression (ScPGE) from discrete candidate CREs (cCREs). ScPGE assembles DNA sequences, transcription factor (TF) binding scores, and epigenomic tracks from discrete cCREs into three-dimensional tensors, and then models the relationships between cCREs and genes by combining convolutional neural networks with transformers. Compared with current state-of-the-art models, ScPGE exhibits superior performance in predicting gene expression and yields higher accuracy in identifying active enhancer–gene interactions through attention mechanisms. By comprehensively analyzing ScPGE's predictions, we find a pattern in true positives (TPs) that the regulatory effect of cCREs on genes decreases with distance. Inspired by the pattern, we design two methods to enhance the ability to capture distal cCRE–gene interactions by incorporating chromatin loops into the ScPGE model. Furthermore, ScPGE accurately discovers some crucial TF motifs within prioritized cCREs and reveals the different regulatory types of these cCREs.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.281219.125.

    • Freely available online through the Genome Research Open Access option.

    • Received July 21, 2025.
    • Accepted November 26, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server