Protein domain embeddings for fast and accurate similarity search

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 6.
Figure 6.

Comparison of embedding-based similarity between G6PD-containing proteins. (A) Domain-level embedding similarity computed by DCTdomain; (B) Whole-protein embedding similarity computed by DCTglobal. Domain-level embedding works better for computing the similarity between protein pairs with local similarity; the distribution of similarity scores for such pairs (shown in blue) shifts toward those for global homologs (shown in orange) when domain-level embeddings (A) instead of whole-protein embeddings (B) were used.

This Article

  1. Genome Res. 34: 1434-1444

Preprint Server