Protein domain embeddings for fast and accurate similarity search

Table 2.

Single- and multidomain classification results on 2549 test proteins from the FUpred benchmark

Method Multidomain Single-domain All
Precision Recall Precision Recall ACC MCC
ResPRE-Fupreda 0.860 0.873 0.936 0.929 0.910 0.799
ESM2-Fupred 0.631 0.974 0.982 0.716 0.802 0.651
ESM2-RecCut 0.663 0.941 0.963 0.761 0.821 0.663
  • aResPRE-FUpred results are taken from Zheng et al. (2020). In principle, ESM2-FUpred and ESM2-RecCut should generate the same results as they use the same scoring scheme (so-called FUscore). However, there are certain technical details that we cannot replicate in ESM2-RecCut so the results vary slightly. “ACC” and “MCC” are the accuracy and Matthew's correlation coefficient, respectively. The results shown here were based on contact map predictions using the ESM-2 t30 model. Refer to Supplemental Table S1 for the results based on contact map predictions using the ESM-2 t33 model, which gave more accurate domain predictions but performed worse for our task of homology detection.

This Article

  1. Genome Res. 34: 1434-1444

Preprint Server