RT Journal A1 Shao, Jiangyi A1 Chen, Shutao A1 Wang, Ziwen A1 Chen, Zixu A1 Liu, Bin T1 Balancing gene ontology annotation specificity in protein function prediction based on the protein sequence large graph JF Genome Research JO Genome Research YR 2026 FD April 15 DO 10.1101/gr.280816.125 SP gr.280816.125 UL http://genome.cshlp.org/content/early/2026/04/15/gr.280816.125.abstract AB Accurate protein function prediction is fundamental to advancing drug discovery, precision medicine, and understanding complex biological systems. While gene ontology (GO) provides a standardized framework for protein annotation, a critical challenge persists: the imbalance between low-specificity GO terms and high-specificity GO terms. This imbalance creates blind spots in our understanding of protein function landscapes, particularly in clinically relevant pathways. We present ProGO-PSL, a novel large graph architecture designed to resolve this imbalance. ProGO-PSL simultaneously leverages explicit domain identifier from InterPro and implicit evolutionary context from Multiple Sequence Alignments, fusing these complementary data sources within a powerful imbalance learning framework. Our model consistently outperforms state-of-the-art methods by 5-15% across all specificity levels and on both benchmark dataset and independent test set, demonstrating robust generalization. Furthermore, ProGO-PSL generates interpretable representations that clarify relationships between low- and high-specificity GO terms, enabling a more complete functional characterization of the proteome. This work accelerates the identification of therapeutic targets in previously uncharacterized biological pathways.