Accounting for Human Polymorphisms Predicted to Affect Protein Function

Table 5.

Comparison of Amino Acid Prediction Tools on nsSNPs to Estimate the Percentage of nsSNPs that Affect Protein Function in an Individual

Ng and Henikoff Chasman and Adams (2001) Sunyaev et al. (2001) Sunyaev et al. (corrected)
Dataset Prediction Dataset Prediction Dataset Prediction Dataset Prediction
% nsSNPs predicted to affect protein function   − WI-nsSNPs 19% (22/115) WI-nsSNPs 15% (8/53) nsSNPs from public databases 32% (79/245) nsSNPs from public databases, biased mutations removed 19% (41/207)
False positive error (% neutral substitutions predicted to affect protein function) = LacI 20% LacI 30% (345/1131) Substitutions between human proteins and their orthologues 9% (41/399) Substitutions between human proteins and their orthologues 9% (41/399)
% nsSNPs that affect protein function No difference No extrapolation can   be made ∼20% ∼10%, after removing biased nonsynonymous variants
  • For this estimation, an unbiased set of nsSNPs (detected from normal individuals) should be used and the false positive error subtracted from the percentage of nsSNPs predicted to be damaging.

  • From Table 1.

  • From Table 7b of Chasman and Adams (2001).

  • The databases from which the polymorphisms were obtained also contained disease-causing mutations so are not representative of random nsSNPs; see Discussion.

  • From Table 1 of Sunyaev et al. (2001).

  • See Discussion, section “Estimating the Number of Damaging nsSNPs in an Individual.”

  • Calculating the false positive error using the values from the leftmost column of Table 5 from Chasman and Adams (2001); this test set has the highest total prediction accuracy. Out of 1131 substitutions in LacI that have no effect, 345 substitutions were predicted to affect function (false positive error).

  • The value (41/399) used to calculate the false positive error was taken from Table 1 of Sunyaev et al. (2001). But since proteins with contaminating variants were removed, this should be readjusted.

  • See Discussion for why the 9% false positive error is a lower limit and therefore the 10% is an overestimate of the percentage of damaging nsSNPs that affect protein function.

This Article

  1. Genome Res. 12: 436-446

Preprint Server