Shuai Zeng; Duolin Wang; Lei Jiang; Dong Xu

Table 1.

Benchmark results for SP prediction in Sec/SPI, Sec/SPII, and Tat/SPI

SP Types	Method	Archaea		Eukarya	Gram-negative bacteria		Gram-positive bacteria
SP Types	Method	MCC1	MCC2	MCC1	MCC1	MCC2	MCC1	MCC2
Sec/SPI	PEFT-SP (LoRA)	0.805	0.783	0.958	0.862	0.809	0.915	0.848
	SignalP 6.0 retrained	0.798	0.793	0.952	0.856	0.804	0.885	0.77
	SignalP 6.0^a	0.737	0.728	0.868	0.811	0.649	0.878	0.734
	SignalP 5.0^a	0.711	0.67	0.774	0.705	0.586	0.798	0.669
	DEEPSIG^a	n.d.	n.d.	0.792	0.735	0.159	0.798	0.146
	LipoP^a	0.775	0.619	0.347	0.744	0.471	0.879	0.442
	PHILIUS^a	0.691	0.438	0.448	0.766	0.147	0.752	0.084
	PHOBIUS^a	0.796	0.551	0.531	0.766	0.153	0.716	0.08
	PolyPhobius^a	0.715	0.474	0.478	0.813	0.173	0.777	0.136
	PRED-LIPO^a	0.733	0.552	0.196	0.710	0.342	0.879	0.484
	PRED-SIGNAL^a	0.908	0.670	0.265	0.662	0.115	0.822	0.171
	PRED-TAT^a	0.781	0.655	0.340	0.736	0.209	0.839	0.238
	SIGNAL-CF^a	n.d.	n.d.	0.333	0.52	0.123	0.474	0.1
	Signal-3L 2.0^a	n.d.	n.d.	0.605	0.731	0.108	0.878	0.133
	SOSUIsignal^a	n.d.	n.d.	0.368	0.639	0.123	0.702	0.107
	SPEPlip^a	n.d.	n.d.	0.652	0.705	0.489	0.578	0.429
	SPOCTOPUS^a	0.732	0.448	0.506	0.849	0.165	0.879	0.134
	TOPCONS2^a	0.711	0.438	0.504	0.844	0.159	0.836	0.078
Sec/SPII	PEFT-SP (LoRA)	0.858	0.730	n.d.	0.955	0.945	0.928	0.939
	SignalP 6.0 retrained	0.885	0.825	n.d.	0.942	0.929	0.868	0.882
	SignalP 6.0^a	0.871	0.719	n.d.	0.838	0.841	0.894	0.893
	SignalP 5.0^a	0.871	0.719	n.d.	0.884	0.874	0.883	0.866
	Lipop^a	0.871	0.681	n.d.	0.806	0.813	0.71	0.724
	PRED-LIPO^a	0.728	0.608	n.d.	0.615	0.655	0.762	0.743
	SPEPlip^a	n.d.	n.d.	n.d.	0.856	0.86	0.842	0.837
Tat/SPI	PEFT-SP (LoRA)	0.610	0.579	n.d.	0.975	0.961	0.845	0.85
	SignalP 6.0 retrained	0.599	0.563	n.d.	0.978	0.962	0.788	0.799
	SignalP 6.0^a	0.802	0.807	n.d.	0.946	0.934	0.788	0.806
	SignalP 5.0^a	0.807	0.763	n.d.	0.719	0.732	0.708	0.700
	PRED-TAT^a	0.937	0.719	n.d.	0.945	0.869	0.823	0.643
	TatP^a	0.733	0.474	n.d.	0.730	0.591	0.568	0.411
	TATFIND^a	0.937	0.662	n.d.	0.892	0.845	0.711	0.580

The values are the mean MCC1/MCC2 scores across nest cross-validation. The bold values represent the highest MCC1/MCC2 score among the predictors in a particular SP type. (n.d.) The model was not trained on the data.
^aPerformance reported in SignalP 6.0.

Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction

This Article

Preprint Server

Current Issue

In This Issue