Balancing gene ontology annotation specificity in protein function prediction based on the protein sequence large graph

  1. Bin Liu1
  1. Beijing Institute of Technology
  • * Corresponding author; email: bliu{at}bliulab.net
  • Abstract

    Accurate protein function prediction is fundamental to advancing drug discovery, precision medicine, and understanding complex biological systems. While gene ontology (GO) provides a standardized framework for protein annotation, a critical challenge persists: the imbalance between low-specificity GO terms and high-specificity GO terms. This imbalance creates blind spots in our understanding of protein function landscapes, particularly in clinically relevant pathways. We present ProGO-PSL, a novel large graph architecture designed to resolve this imbalance. ProGO-PSL simultaneously leverages explicit domain identifier from InterPro and implicit evolutionary context from Multiple Sequence Alignments, fusing these complementary data sources within a powerful imbalance learning framework. Our model consistently outperforms state-of-the-art methods by 5-15% across all specificity levels and on both benchmark dataset and independent test set, demonstrating robust generalization. Furthermore, ProGO-PSL generates interpretable representations that clarify relationships between low- and high-specificity GO terms, enabling a more complete functional characterization of the proteome. This work accelerates the identification of therapeutic targets in previously uncharacterized biological pathways.

    • Received April 20, 2025.
    • Accepted April 9, 2026.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.280816.125 Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server