Markup | Genome Research

Table 1.

Simulated Results Assuming the Genome Given by Observations

Model ->	Assumed	Unique, no errors	Unique	Random	Non-random
A. 9 Base sequences
Unique tags	15,720	7994 ± 5	11,029 ± 6	10,930 ± 6	10,427 ± 5
% 1–5	64.16	38.86 ± 0.02	53.63 ± 0.02	53.33 ± 0.02	51.26 ± 0.02
% 5–50	31.0373	52.21 ± 0.02	40.67 ± 0.02	40.26 ± 0.02	41.88 ± 0.02
% 50–500	4.3815	8.17 ± 0.01	5.77 ± 0.007	5.87 ± 0.007	6.28 ± 0.007
% 500–5000	0.4212	0.76 ± 0.003	0.54 ± 0.002	0.54 ± 0.002	0.57 ± 0.002
% Errors novel	–	–	94.0 ± 0.01	94.2 ± 0.01	84.6 ± 0.3
% Unique genes	–	100 ± 0	100 ± 0	94.2 ± 0.01	81.6 ± 0.01
B. 10 Base sequences
Unique tags	15,720	8,003 ± 5	11,460 ± 6	11,428 ± 6	11,268 ± 5
% 1–5	64.16	38.86 ± 0.02	55.44 ± 0.02	55.43 ± 0.02	54.65 ± 0.02
% 5–50	31.0373	52.23 ± 0.02	38.51 ± 0.02	38.50 ± 0.02	39.15 ± 0.02
% 50–500	4.3815	8.16 ± 0.01	5.53 ± 0.006	5.54 ± 0.006	5.68 ± 0.006
% 500–5000	0.4212	0.75 ± 0.003	0.52 ± 0.002	0.52 ± 0.002	0.52 ± 0.002
% Errors novel	–	–	98.5 ± 0.007	98.5 ± 0.007	95.0 ± 0.01
% Unique genes	–	100 ± 0	100 ± 0	98.5 ± 0.004	94.0 ± 0.008
C. 10 Base sequences (five times larger genome)
Unique tags	78,600	47,086 ± 10	64,364 ± 10	63,407 ± 10	58,573 ± 8
% 1–5	64.16	43.35 ± 0.01	58.24 ± 0.009	57.77 ± 0.009	53.94 ± 0.009
% 6–50	31.0373	48.71 ± 0.01	36.07 ± 0.01	36.46 ± 0.01	39.77 ± 0.01
% 51–500	4.3815	7.23 ± 0.004	5.26 ± 0.003	5.34 ± 0.003	5.80 ± 0.003
% 501–5000	0.4212	0.71 ± 0.001	0.43 ± 0.0009	0.44 ± 0.0009	0.48 ± 0.001
% Errors novel	–	–	92.5 ± 0.007	92.8 ± 0.006	79.4 ± 0.01
% Unique genes	–	100 ± 0	100 ± 0	92.8 ± 0.004	75.4 ± 0.006

[i] Simulated results of SAGE experiments. In all cases, the genome is assumed to be as represented in the column “Assumed.” The columns “Unique, no errors,” “Unique,” “Random,” and “Non-random,” represent the assumptions outlined in this order in Methods. The row headings “Unique tags” and % copy numbers represent the assumed or detected number of unique tag sequences and their copy numbers. “% Errors novel,” the percentage of erroneously sequenced tags that are novel (not present on some other mRNA). “% Unique genes,” the percentage of actively transcribed genes that have unique tag sequences. A and B, 9- and 10-base tag sequences, respectively, assuming published findings for SAGE experiments. C, 10-base tags assuming a genome with 5 times the number of unique tags and 5 times the number of tags. The remaining columns represent increasingly realistic assumptions about the SAGE process as detailed in Methods. In all cases, the number of unique genes detected is significantly underestimated, as is the fraction of low copy number transcripts. Confidence values are standard errors of the mean for 1000 simulations.