RT Journal A1 Stollberg, Jes A1 Urschitz, Johann A1 Urban, Zsolt A1 Boyd, Charles D. T1 A Quantitative Evaluation of SAGE JF Genome Research JO Genome Research YR 2000 FD August 01 VO 10 IS 8 SP 1241 OP 1248 DO 10.1101/gr.10.8.1241 UL http://genome.cshlp.org/content/10/8/1241.abstract AB Serial Analysis of Gene Expression (SAGE) is an innovative technique that offers the potential of cataloging both the identity and relative frequencies of mRNA transcripts in a given poly(A+) RNA preparation. Although it is a very effective approach for determining the expression of mRNA populations, there are significant biases in the observed results that are inherent in the experimental process. These are caused by sampling error, sequencing error, nonuniqueness, and nonrandomness of tag sequences. The quantitative information desired from SAGE experiments consists of estimates of the number of genes and the frequency distribution of transcript copy numbers. Of additional concern is the extent to which a given tag sequence can be assumed to be unique to its gene. The present study takes these mathematical biases into account and presents a basis for maximum likelihood estimation of gene number and transcript copy frequencies given a set of experimental results. These estimates of the true state of genomic expression are markedly different from those based directly on the observations from the underlying experiments. It also is shown that while in many cases it is probable that a given tag sequence is unique within the genome, in larger genomes this cannot be safely assumed.