Computational Detection and Location of Transcription Start Sites in Mammalian Genomic DNA

Table 1.

Comparison of Various Promoter- and TSS-Detection Methods on the Chromosome 22–Derived Pseudo Chromosome

Method Predictions True positives False positives Sensitivity (%) Selectivity (%)
Eponine 215 152 57 53.5 73.5
PromoterInspector 278 157 100 55.3 64.0
CpG 306 187 116 65.8 62.1
TATA-2.6 540 37 500 13.0 7.4
TATA-6.5 39869 283 37581 99.6 5.7
  • Sensitivity is defined here as the proportion of annotated mRNA starts that are detected by a given method (within 2 kb). Selectivity is the proportion of predictions that are confirmed by the presence of an annotated mRNA start.

  • PromoterInspector predictions for chromosome 22 (Scherf et al. 2001) were obtained from their web site. These applied to an older assembly of the chromosome and so were mapped onto the latest assembly using SSAHA (Ning et al., 2001). 99.4% of predictions were successfully mapped in this way. CpG islands were extracted from the chromosome 22 annotation repository. Note that this set of CpG island predictions was available to annotators working on this chromosome, so there is some possibility of bias in favor of this method. TATA-box motifs were detected using the log-odds weight matrix published by Bucher (Bucher 1990). The cutoff threshold of −6.5, recommended by Fickett (Fickett and Hatzigeorgiou 1997), gave 84,886 predictions: many more than any other method. We also used the far more stringent threshold of −2.6. This gave 1196 predictions, more in line with the other methods tested.

  • For Eponine and TATA box predictions, strand information was ignored (i.e., a prediction on the wrong strand will still be considered correct).

  • PromoterInspector predictions and CpG islands do not provide any information about direction of transcription.

This Article

  1. Genome Res. 12: 458-461

Preprint Server