Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction

Shuai Zeng; Duolin Wang; Lei Jiang; Dong Xu

doi:10.1101/gr.279132.124

Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction

University of Missouri

↵* Corresponding author; email: xudong{at}missouri.edu

Abstract

Signal peptides (SP) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provides a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated low-rank adaptation (LoRA) into ESM-2 models to better leverage the protein sequence evolutionary knowledge of PLMs. Experiments show that PEFT-SP using LoRA enhances state-of-the-art results, leading to a maximum Matthews correlation coefficient (MCC) gain of 87.3% for SPs with small training samples and an overall MCC gain of 6.1%. Furthermore, we also employed two other PEFT methods, prompt tuning and adapter tuning, in ESM-2 for SP prediction. More elaborate experiments show that PEFT-SP using adapter tuning can also improve the state-of-the-art results by up to 28.1% MCC gain for SPs with small training samples and an overall MCC gain of 3.8%. LoRA requires fewer computing resources and less memory than the adapter during the training stage, making it possible to adapt larger and more powerful protein models for SP prediction.

Received February 15, 2024.
Accepted July 15, 2024.

Published by Cold Spring Harbor Laboratory Press

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Articles citing this article

BERT-T6: Towards High-accuracy T6SS Bacterial Toxin Identification Using Protein Language Model bioRxiv October 22, 2025 0: 2025.10.17.683028v1-2025.10.17.683028

Evaluating the Effectiveness of Parameter-Efficient Fine-Tuning in Genomic Classification Tasks bioRxiv August 29, 2025 0: 2025.08.21.671544v1-2025.08.21.671544

ProtLoc-GRPO: Cell line-specific subcellular localization prediction using a graph-based model and reinforcement learning bioRxiv July 25, 2025 0: 2025.07.17.665451v1-2025.07.17.665451

Enhancing Structure-aware Protein Language Models with Efficient Fine-tuning for Various Protein Prediction Tasks bioRxiv April 29, 2025 0: 2025.04.23.650337v1-2025.04.23.650337

OPUS-GO: An interpretable protein/RNA sequence annotation framework based on biological language model bioRxiv January 2, 2025 0: 2024.12.17.629067v1-2024.12.17.629067

Innovations in computational biology: RECOMB 2024 Special Issue Genome Res September 1, 2024 34: vii-viii

Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction

Abstract

Articles citing this article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Share

Preprint Server

Current Issue

In This Issue