Protein domain embeddings for fast and accurate similarity search

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

A diagram showing the inference of domain-based embeddings (DCT fingerprints). For the example protein with two domains, three DCT fingerprints will be derived, one representing the whole protein and the other two are the representations of the domains. This diagram uses a two-domain protein, the ESM-2 t30 model, and fingerprints of size 480 for demonstration purposes without loss of generality.

This Article

  1. Genome Res. 34: 1434-1444

Preprint Server