Predicted TF Binding Sites and CpG Islands in the 1031 PPRs
| TF definition | Matrix ID | Hit No. (%) | Preferred region (searched region) | Cutoff value | Consensus sequence |
| TATA box | V$TATA_01 | 329 (32%) | −40 ∼ −23 | 0.77 | STATAAAWRNNNNNN |
| (−90 ∼ +27) | |||||
| Initiator | V$CAP_01 | 872 (85%) | −5 ∼ +6 | 0.87 | NCANNNNN |
| (−55 ∼ +56) | |||||
| GC box | V$GC_01 | 999 (97%) | −74 ∼ −45 | 0.78 | NRGGGGCGGGGCNK |
| (−124 ∼ +5) | |||||
| CAAT box | V$CAAT_01 | 663 (64%) | −105 ∼ −70 | 0.78 | NNNRRCCAATSA |
| (−155 ∼ −20) |
| Hit No. (%) | Length (bp) | CpG ratio | GC content (%) | |
| CpG island | 493 (48%) | >200 | 0.6 | 50 |
-
The search for TF binding sites was performed using the preferred region of each TF binding motif. For example, because the preferred region of the TATA box is −40 to −23, the region of −90 to +27 was searched. Fifty-base margins were added at both ends of the preferred region because in some cases multiple mRNA start sites were observed.
-
↵A TRANSFAC notation, which starts with an identifier that indicates vertebrates (V$), followed by an acronym for the factor (for more details, see http://transfac.gbf.de/TRANSFAC/doc/site3.html).
-
↵The symbols used in addition to A, C, G, and T are: W = A or T; S = C or G; R = A or G; T = C or T; K = G or T; M = A or C; B = C, G, or T; D = A, G, or T; H = A, C, or T; V = A, C, or G; N = A, C, G, or T.











