Table 1.

Predicted TF Binding Sites and CpG Islands in the 1031 PPRs

TF definition [ii]Matrix ID Hit No. (%) Preferred region (searched region) Cutoff value [iii]Consensus sequence
TATA boxV$TATA_01329 (32%)−40 ∼ −230.77STATAAAWRNNNNNN
(−90 ∼ +27)
InitiatorV$CAP_01872 (85%)−5 ∼ +60.87NCANNNNN
(−55 ∼ +56)
GC boxV$GC_01999 (97%)−74 ∼ −450.78NRGGGGCGGGGCNK
(−124 ∼ +5)
CAAT boxV$CAAT_01663 (64%)−105 ∼ −700.78NNNRRCCAATSA
(−155 ∼ −20)
Hit No. (%) Length (bp) CpG ratio GC content (%)
CpG island493 (48%)>2000.650

[i] The search for TF binding sites was performed using the preferred region of each TF binding motif. For example, because the preferred region of the TATA box is −40 to −23, the region of −90 to +27 was searched. Fifty-base margins were added at both ends of the preferred region because in some cases multiple mRNA start sites were observed.

[ii] A TRANSFAC notation, which starts with an identifier that indicates vertebrates (V$), followed by an acronym for the factor (for more details, see http://transfac.gbf.de/TRANSFAC/doc/site3.html).

[iii] The symbols used in addition to A, C, G, and T are: W = A or T; S = C or G; R = A or G; T = C or T; K = G or T; M = A or C; B = C, G, or T; D = A, G, or T; H = A, C, or T; V = A, C, or G; N = A, C, G, or T.