Figure 3.

An ATG desert is specifically correlated with the absence of a TATAA box. ATG codon frequencies were analyzed in the region 2 Kb upstream and 2 Kb downstream from transcription initiation using a data set of 17,718 unique genes. In this analysis, TATAA was defined as the sequence TATAAT occurring within 200 bp of transcription initiation. CGI were identified as sequences with a CpG content >0.55, a length of >500 bp, and observed/expected ratio >0,65. The major TSS for any given gene is located at position 0 on the X axis. Each point represents the frequency of observed ATG codons within a 100-bp window; 95% confidence intervals are indicated by the triangles. The number of genes represented in each group is in parenthesis. The groups are designated (A) TATAAT, no CGI; (B) No TATAAT, no CGI; (C) TATAAT, CGI; (D) No TATAAT; CGI. (E) Noncoding genes 75 rRNA, 119 snoRNA, and 1604 tRNA genes.

gr38737f3a_1t
gr38737f3e_1t