Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction

Table 1.

Benchmark results for SP prediction in Sec/SPI, Sec/SPII, and Tat/SPI

SP Types Method Archaea Eukarya Gram-negative bacteria Gram-positive bacteria
MCC1 MCC2 MCC1 MCC1 MCC2 MCC1 MCC2
Sec/SPI PEFT-SP (LoRA) 0.805 0.783 0.958 0.862 0.809 0.915 0.848
SignalP 6.0 retrained 0.798 0.793 0.952 0.856 0.804 0.885 0.77
SignalP 6.0a 0.737 0.728 0.868 0.811 0.649 0.878 0.734
SignalP 5.0a 0.711 0.67 0.774 0.705 0.586 0.798 0.669
DEEPSIGa n.d. n.d. 0.792 0.735 0.159 0.798 0.146
LipoPa 0.775 0.619 0.347 0.744 0.471 0.879 0.442
PHILIUSa 0.691 0.438 0.448 0.766 0.147 0.752 0.084
PHOBIUSa 0.796 0.551 0.531 0.766 0.153 0.716 0.08
PolyPhobiusa 0.715 0.474 0.478 0.813 0.173 0.777 0.136
PRED-LIPOa 0.733 0.552 0.196 0.710 0.342 0.879 0.484
PRED-SIGNALa 0.908 0.670 0.265 0.662 0.115 0.822 0.171
PRED-TATa 0.781 0.655 0.340 0.736 0.209 0.839 0.238
SIGNAL-CFa n.d. n.d. 0.333 0.52 0.123 0.474 0.1
Signal-3L 2.0a n.d. n.d. 0.605 0.731 0.108 0.878 0.133
SOSUIsignala n.d. n.d. 0.368 0.639 0.123 0.702 0.107
SPEPlipa n.d. n.d. 0.652 0.705 0.489 0.578 0.429
SPOCTOPUSa 0.732 0.448 0.506 0.849 0.165 0.879 0.134
TOPCONS2a 0.711 0.438 0.504 0.844 0.159 0.836 0.078
Sec/SPII PEFT-SP (LoRA) 0.858 0.730 n.d. 0.955 0.945 0.928 0.939
SignalP 6.0 retrained 0.885 0.825 n.d. 0.942 0.929 0.868 0.882
SignalP 6.0a 0.871 0.719 n.d. 0.838 0.841 0.894 0.893
SignalP 5.0a 0.871 0.719 n.d. 0.884 0.874 0.883 0.866
Lipopa 0.871 0.681 n.d. 0.806 0.813 0.71 0.724
PRED-LIPOa 0.728 0.608 n.d. 0.615 0.655 0.762 0.743
SPEPlipa n.d. n.d. n.d. 0.856 0.86 0.842 0.837
Tat/SPI PEFT-SP (LoRA) 0.610 0.579 n.d. 0.975 0.961 0.845 0.85
SignalP 6.0 retrained 0.599 0.563 n.d. 0.978 0.962 0.788 0.799
SignalP 6.0a 0.802 0.807 n.d. 0.946 0.934 0.788 0.806
SignalP 5.0a 0.807 0.763 n.d. 0.719 0.732 0.708 0.700
PRED-TATa 0.937 0.719 n.d. 0.945 0.869 0.823 0.643
TatPa 0.733 0.474 n.d. 0.730 0.591 0.568 0.411
TATFINDa 0.937 0.662 n.d. 0.892 0.845 0.711 0.580
  • The values are the mean MCC1/MCC2 scores across nest cross-validation. The bold values represent the highest MCC1/MCC2 score among the predictors in a particular SP type. (n.d.) The model was not trained on the data.

  • aPerformance reported in SignalP 6.0.

This Article

  1. Genome Res. 34: 1445-1454

Preprint Server