Table 1.

Predicted TF Binding Sites and CpG Islands in the 1031 PPRs

TF definition	Matrix ID	Hit No. (%)	Preferred region (searched region)	Cutoff value	Consensus sequence
TATA box	V$TATA_01	329 (32%)	−40 ∼ −23	0.77	STATAAAWRNNNNNN
			(−90 ∼ +27)
Initiator	V$CAP_01	872 (85%)	−5 ∼ +6	0.87	NCANNNNN
			(−55 ∼ +56)
GC box	V$GC_01	999 (97%)	−74 ∼ −45	0.78	NRGGGGCGGGGCNK
			(−124 ∼ +5)
CAAT box	V$CAAT_01	663 (64%)	−105 ∼ −70	0.78	NNNRRCCAATSA
			(−155 ∼ −20)

	Hit No. (%)	Length (bp)	CpG ratio	GC content (%)
CpG island	493 (48%)	>200	0.6	50

The search for TF binding sites was performed using the preferred region of each TF binding motif. For example, because the preferred region of the TATA box is −40 to −23, the region of −90 to +27 was searched. Fifty-base margins were added at both ends of the preferred region because in some cases multiple mRNA start sites were observed.
↵A TRANSFAC notation, which starts with an identifier that indicates vertebrates (V$), followed by an acronym for the factor (for more details, see http://transfac.gbf.de/TRANSFAC/doc/site3.html).
↵The symbols used in addition to A, C, G, and T are: W = A or T; S = C or G; R = A or G; T = C or T; K = G or T; M = A or C; B = C, G, or T; D = A, G, or T; H = A, C, or T; V = A, C, or G; N = A, C, G, or T.

Identification and Characterization of the Potential Promoter Regions of 1031 Kinds of Human Genes