Protein domain embeddings for fast and accurate similarity search

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 7.
Figure 7.

AUC1 plots for comparison of the different methods on the CATH20 database search benchmark that contains distant homologs. As reported in their paper (Schütze et al. 2022), using ProtT5 mean embeddings (i.e., knnProtT5) outperforms MMseqs2-sensitive. DCTdomain outperforms ProtT5-Mean by a similar margin, whereas using the mean embedding method with ESM-2 performs slightly worse than MMseqs2-sens (sensitive mode).

This Article

  1. Genome Res. 34: 1434-1444

Preprint Server