Table 1.

Protein benchmarks for homology/similarity detection

Benchmarka# Pairs (homologs)Homolog definitionExample protein [domain architecture]
pfam-max5010,450 (5228)Identical domain architecture; <50 aa between domainsQ9VFJ2 [PF03946, PF00298]

P53875 [PF03946, PF00298]
pfam-nomax5071,988 (36,278)Identical domain architecture; no constraint on the amino acid between domainsQ15149 [PF03501, CL0188, CL0188, PF00681]

Q9QXS1 [PF03501, CL0188, CL0188, PF00681]
pfam-local15,273 (7602)Share some domains, but not allP40791 [PF00319, PF12347]

Q8VWM8 [PF00319, PF01486]
gene3d-nomax5058,163 (29,109)Same as pfam-nomax50 but based on CATH domainsP52917 [1.20.58.280, 3.40.50.300]

Q9ZNT0 [1.20.58.280, 3.40.50.300]
supfam-nomax5049,365 (24,708)Same as pfam-nomax50 but based on SCOP domainsQ9T0N8 [56,176, 55,103]

P46681 [56,176, 55,103]

[i] aThe benchmarks are denoted as pfam-max50, gene3d-nomax50, and so on to indicate the domain database used for defining the homologs, with the number of pairs (total/homologs) in each benchmark listed in the second column. The benchmarks include full-length proteins. Each particular benchmark's definition of homology is located in the third column, and example protein domain architectures are depicted in the last column.