Table 5.

Comparison of Amino Acid Prediction Tools on nsSNPs to Estimate the Percentage of nsSNPs that Affect Protein Function in an Individual

	Ng and Henikoff		Chasman and Adams (2001)		Sunyaev et al. (2001)		Sunyaev et al. (corrected)
	Dataset	Prediction	Dataset	Prediction	Dataset	Prediction	Dataset	Prediction
% nsSNPs predicted to affect protein function −	WI-nsSNPs	19% (22/115)	WI-nsSNPs	15% (8/53)	nsSNPs from public databases	32% (79/245)	nsSNPs from public databases, biased mutations removed	19% (41/207)
False positive error (% neutral substitutions predicted to affect protein function) =	LacI	20%	LacI	30% (345/1131)	Substitutions between human proteins and their orthologues	9% (41/399)	Substitutions between human proteins and their orthologues	9% (41/399)
% nsSNPs that affect protein function	No difference		No extrapolation can be made		∼20%		∼10%, after removing biased nonsynonymous variants

For this estimation, an unbiased set of nsSNPs (detected from normal individuals) should be used and the false positive error subtracted from the percentage of nsSNPs predicted to be damaging.
↵From Table 1.
↵From Table 7b of Chasman and Adams (2001).
↵The databases from which the polymorphisms were obtained also contained disease-causing mutations so are not representative of random nsSNPs; see Discussion.
↵From Table 1 of Sunyaev et al. (2001).
↵See Discussion, section “Estimating the Number of Damaging nsSNPs in an Individual.”
↵Calculating the false positive error using the values from the leftmost column of Table 5 from Chasman and Adams (2001); this test set has the highest total prediction accuracy. Out of 1131 substitutions in LacI that have no effect, 345 substitutions were predicted to affect function (false positive error).
↵The value (41/399) used to calculate the false positive error was taken from Table 1 of Sunyaev et al. (2001). But since proteins with contaminating variants were removed, this should be readjusted.
See Discussion for why the 9% false positive error is a lower limit and therefore the 10% is an overestimate of the percentage of damaging nsSNPs that affect protein function.

Accounting for Human Polymorphisms Predicted to Affect Protein Function