Comparison of Amino Acid Prediction Tools on nsSNPs to Estimate the Percentage of nsSNPs that Affect Protein Function in an Individual
| Ng and Henikoff | Chasman and Adams (2001) | Sunyaev et al. (2001) | Sunyaev et al. (corrected) | |||||
| Dataset | Prediction | Dataset | Prediction | Dataset | Prediction | Dataset | Prediction | |
| % nsSNPs predicted to affect protein function − | WI-nsSNPs | 19% (22/115) | WI-nsSNPs | 15% (8/53) | nsSNPs from public databases | 32% (79/245) | nsSNPs from public databases, biased mutations removed | 19% (41/207) |
| False positive error (% neutral substitutions predicted to affect protein function) = | LacI | 20% | LacI | 30% (345/1131) | Substitutions between human proteins and their orthologues | 9% (41/399) | Substitutions between human proteins and their orthologues | 9% (41/399) |
| % nsSNPs that affect protein function | No difference | No extrapolation can be made | ∼20% | ∼10%, after removing biased nonsynonymous variants | ||||
-
For this estimation, an unbiased set of nsSNPs (detected from normal individuals) should be used and the false positive error subtracted from the percentage of nsSNPs predicted to be damaging.
-
↵From Table 1.
-
↵From Table 7b of Chasman and Adams (2001).
-
↵The databases from which the polymorphisms were obtained also contained disease-causing mutations so are not representative of random nsSNPs; see Discussion.
-
↵From Table 1 of Sunyaev et al. (2001).
-
↵See Discussion, section “Estimating the Number of Damaging nsSNPs in an Individual.”
-
↵Calculating the false positive error using the values from the leftmost column of Table 5 from Chasman and Adams (2001); this test set has the highest total prediction accuracy. Out of 1131 substitutions in LacI that have no effect, 345 substitutions were predicted to affect function (false positive error).
-
↵The value (41/399) used to calculate the false positive error was taken from Table 1 of Sunyaev et al. (2001). But since proteins with contaminating variants were removed, this should be readjusted.
-
See Discussion for why the 9% false positive error is a lower limit and therefore the 10% is an overestimate of the percentage of damaging nsSNPs that affect protein function.











