Predicting Gene Regulatory Elements in Silico on a Genomic Scale

Table 1.

Highest Scoring Nontrivial Patterns with (at Most) One Wild-Card Symbol

No. Pattern Score N+ N
A. Regions −100..0
2 AAG.AAACAAA 6.54 37 1
6 A.TAAGAACA 5.79 27 0
8 A.AATAGGA 5.61 43 3
9 AAGAAA.CAAA 5.58 26 0
12 GTAACAA.C 5.36 25 0
13 AAA.AACTTA 5.36 25 0
20 ACAAC.TAA 5.09 39 3
21 AG.AAACAAA 5.06 64 8
23 ACAAACAA.A 4.97 48 5
26 AATAGTA.A 4.92 77 11
32 AATAGTATA 4.77 27 1
34 TCACTAC.T 4.72 22 0
35 CAAACA.ACA 4.72 22 0
37 ACA.ATAGA 4.72 55 7
42 AGAGA.ATA 4.63 54 7
47 AATAAACAA.A 4.59 26 1
50 AAAG.ACAAG 4.57 35 3
52 CTAAGAA.A 4.55 53 7
56 A.AAGGGAAG 4.51 21 0
57 CAAA.TAAC 4.50 48 6
B. Regions −250..−150
14 TTACCCGC 6.22 29 0
58 GT.ACCCG 5.59 54 5
71 T.ACCCGC 5.48 42 3
126 CGGGTA.T 5.06 64 8
141 G.TACCCG 4.97 48 5
165 CGGGTAA.A 4.87 47 5
178 GTTACCCG 4.83 37 3
305 TACAT.TATA 4.43 65 10
353 TTTCTC.TTT 4.32 46 6
372 TTACCCG 4.30 119 23
379 TTTCCTGT.T 4.29 20 0
405 CTCATCTC.T 4.24 24 1
425 TCACGTGA 4.20 28 2
427 T.ATATATTC 4.20 28 2
454 CGGGTAA 4.12 114 23
460 TGTGT.GAT 4.08 19 0
465 ATTACCCG.A 4.08 19 0
474 G.ACATATAT 4.06 23 1
485 TA.GTAAAC 4.05 27 2
500 TTTCTCT.TT 4.03 47 7
  • Matches were only allowed on the W (gene) strand.

  • No. of the pattern enumerating them decreasingly by scores (before trivial patterns were removed).

  • From equation 2.

  • No. of upstream regions matching the pattern.

  • No. of random sequences matching the pattern.

This Article

  1. Genome Res. 8: 1202-1215

Preprint Server