A Quantitative Evaluation of SAGE

Table 1.

Simulated Results Assuming the Genome Given by Observations

Model -> Assumed Unique, no errors Unique Random Non-random
A. 9 Base sequences
Unique tags 15,720 7994 ± 5 11,029 ± 6 10,930 ± 6 10,427 ± 5
% 1–5 64.16 38.86 ± 0.02 53.63 ± 0.02 53.33 ± 0.02 51.26 ± 0.02
% 5–50 31.0373 52.21 ± 0.02 40.67 ± 0.02 40.26 ± 0.02 41.88 ± 0.02
% 50–500 4.3815 8.17 ± 0.01 5.77 ± 0.007 5.87 ± 0.007 6.28 ± 0.007
% 500–5000 0.4212 0.76 ± 0.003 0.54 ± 0.002 0.54 ± 0.002 0.57 ± 0.002
% Errors novel 94.0 ± 0.01 94.2 ± 0.01 84.6 ± 0.3
% Unique genes 100 ± 0 100 ± 0 94.2 ± 0.01 81.6 ± 0.01
B. 10 Base sequences
Unique tags 15,720 8,003 ± 5 11,460 ± 6 11,428 ± 6 11,268 ± 5
% 1–5 64.16 38.86 ± 0.02 55.44 ± 0.02 55.43 ± 0.02 54.65 ± 0.02
% 5–50 31.0373 52.23 ± 0.02 38.51 ± 0.02 38.50 ± 0.02 39.15 ± 0.02
% 50–500 4.3815 8.16 ± 0.01 5.53 ± 0.006 5.54 ± 0.006 5.68 ± 0.006
% 500–5000 0.4212 0.75 ± 0.003 0.52 ± 0.002 0.52 ± 0.002 0.52 ± 0.002
% Errors novel 98.5 ± 0.007 98.5 ± 0.007 95.0 ± 0.01
% Unique genes 100 ± 0 100 ± 0 98.5 ± 0.004 94.0 ± 0.008
C. 10 Base sequences (five times larger genome)
Unique tags 78,600 47,086 ± 10 64,364 ± 10 63,407 ± 10 58,573 ± 8
% 1–5 64.16 43.35 ± 0.01 58.24 ± 0.009 57.77 ± 0.009 53.94 ± 0.009
% 6–50 31.0373 48.71 ± 0.01 36.07 ± 0.01 36.46 ± 0.01 39.77 ± 0.01
% 51–500 4.3815 7.23 ± 0.004 5.26 ± 0.003 5.34 ± 0.003 5.80 ± 0.003
% 501–5000 0.4212 0.71 ± 0.001 0.43 ± 0.0009 0.44 ± 0.0009 0.48 ± 0.001
% Errors novel 92.5 ± 0.007 92.8 ± 0.006 79.4 ± 0.01
% Unique genes 100 ± 0 100 ± 0 92.8 ± 0.004 75.4 ± 0.006
  • Simulated results of SAGE experiments. In all cases, the genome is assumed to be as represented in the column “Assumed.” The columns “Unique, no errors,” “Unique,” “Random,” and “Non-random,” represent the assumptions outlined in this order in Methods. The row headings “Unique tags” and % copy numbers represent the assumed or detected number of unique tag sequences and their copy numbers. “% Errors novel,” the percentage of erroneously sequenced tags that are novel (not present on some other mRNA). “% Unique genes,” the percentage of actively transcribed genes that have unique tag sequences. A and B, 9- and 10-base tag sequences, respectively, assuming published findings for SAGE experiments. C, 10-base tags assuming a genome with 5 times the number of unique tags and 5 times the number of tags. The remaining columns represent increasingly realistic assumptions about the SAGE process as detailed in Methods. In all cases, the number of unique genes detected is significantly underestimated, as is the fraction of low copy number transcripts. Confidence values are standard errors of the mean for 1000 simulations.

This Article

  1. Genome Res. 10: 1241-1248

Preprint Server