Simulated Results Assuming the Genome Given by Observations
| Model -> | Assumed | Unique, no errors | Unique | Random | Non-random |
| A. 9 Base sequences | |||||
| Unique tags | 15,720 | 7994 ± 5 | 11,029 ± 6 | 10,930 ± 6 | 10,427 ± 5 |
| % 1–5 | 64.16 | 38.86 ± 0.02 | 53.63 ± 0.02 | 53.33 ± 0.02 | 51.26 ± 0.02 |
| % 5–50 | 31.0373 | 52.21 ± 0.02 | 40.67 ± 0.02 | 40.26 ± 0.02 | 41.88 ± 0.02 |
| % 50–500 | 4.3815 | 8.17 ± 0.01 | 5.77 ± 0.007 | 5.87 ± 0.007 | 6.28 ± 0.007 |
| % 500–5000 | 0.4212 | 0.76 ± 0.003 | 0.54 ± 0.002 | 0.54 ± 0.002 | 0.57 ± 0.002 |
| % Errors novel | – | – | 94.0 ± 0.01 | 94.2 ± 0.01 | 84.6 ± 0.3 |
| % Unique genes | – | 100 ± 0 | 100 ± 0 | 94.2 ± 0.01 | 81.6 ± 0.01 |
| B. 10 Base sequences | |||||
| Unique tags | 15,720 | 8,003 ± 5 | 11,460 ± 6 | 11,428 ± 6 | 11,268 ± 5 |
| % 1–5 | 64.16 | 38.86 ± 0.02 | 55.44 ± 0.02 | 55.43 ± 0.02 | 54.65 ± 0.02 |
| % 5–50 | 31.0373 | 52.23 ± 0.02 | 38.51 ± 0.02 | 38.50 ± 0.02 | 39.15 ± 0.02 |
| % 50–500 | 4.3815 | 8.16 ± 0.01 | 5.53 ± 0.006 | 5.54 ± 0.006 | 5.68 ± 0.006 |
| % 500–5000 | 0.4212 | 0.75 ± 0.003 | 0.52 ± 0.002 | 0.52 ± 0.002 | 0.52 ± 0.002 |
| % Errors novel | – | – | 98.5 ± 0.007 | 98.5 ± 0.007 | 95.0 ± 0.01 |
| % Unique genes | – | 100 ± 0 | 100 ± 0 | 98.5 ± 0.004 | 94.0 ± 0.008 |
| C. 10 Base sequences (five times larger genome) | |||||
| Unique tags | 78,600 | 47,086 ± 10 | 64,364 ± 10 | 63,407 ± 10 | 58,573 ± 8 |
| % 1–5 | 64.16 | 43.35 ± 0.01 | 58.24 ± 0.009 | 57.77 ± 0.009 | 53.94 ± 0.009 |
| % 6–50 | 31.0373 | 48.71 ± 0.01 | 36.07 ± 0.01 | 36.46 ± 0.01 | 39.77 ± 0.01 |
| % 51–500 | 4.3815 | 7.23 ± 0.004 | 5.26 ± 0.003 | 5.34 ± 0.003 | 5.80 ± 0.003 |
| % 501–5000 | 0.4212 | 0.71 ± 0.001 | 0.43 ± 0.0009 | 0.44 ± 0.0009 | 0.48 ± 0.001 |
| % Errors novel | – | – | 92.5 ± 0.007 | 92.8 ± 0.006 | 79.4 ± 0.01 |
| % Unique genes | – | 100 ± 0 | 100 ± 0 | 92.8 ± 0.004 | 75.4 ± 0.006 |
-
Simulated results of SAGE experiments. In all cases, the genome is assumed to be as represented in the column “Assumed.” The columns “Unique, no errors,” “Unique,” “Random,” and “Non-random,” represent the assumptions outlined in this order in Methods. The row headings “Unique tags” and % copy numbers represent the assumed or detected number of unique tag sequences and their copy numbers. “% Errors novel,” the percentage of erroneously sequenced tags that are novel (not present on some other mRNA). “% Unique genes,” the percentage of actively transcribed genes that have unique tag sequences. A and B, 9- and 10-base tag sequences, respectively, assuming published findings for SAGE experiments. C, 10-base tags assuming a genome with 5 times the number of unique tags and 5 times the number of tags. The remaining columns represent increasingly realistic assumptions about the SAGE process as detailed in Methods. In all cases, the number of unique genes detected is significantly underestimated, as is the fraction of low copy number transcripts. Confidence values are standard errors of the mean for 1000 simulations.











