Averaged conservation of different segments of a “prototypical gene.” Conservation statistics were computed over thousands of aligned pairs of regions of various types, aligned at different reference points. At each position we compute the fraction of aligned pairs that have identical bases at that position (green + purple tiers), have mismatched bases (red), melanogaster bases aligned to deleted bases in pseudoobscura (yellow), or are unaligned in our synteny-filtered BLASTZ alignment (blue). The purple tier shows the fraction of bases that would be expected to match by chance given the base composition at that position in both species. The expected match is <25% because of the inclusion of unaligned and deleted sequences; if these are removed, the baseline is ∼28% because of the slight AT richness of the genome. The vertical panels correspond to different segments of a prototypical gene, indicated on the x-axis. A cartoon of the prototypical gene is represented under the panels. The segments are labeled by the segment of the gene followed in parentheses by the part of that segment by which the segment was aligned. For example, CDS (5′-end) represents the start of the coding sequence aligned by the ATG start sequence, whereas the coding exon (3′-end) is aligned at the 3′-end of the coding exon, and thus the sequences are not all in phase with each other. (A) RIC, random intergenic controls for CRE analysis; (B) nearby controls in order from -250 bp to +250 bp offset from CREs. The right-most nearby controls are closest to the gene start and therefore in a region that is on average more conserved. Some of the nearby controls have a higher match percent (green) as a result; however, CREs have the highest match percent of identical base pairs as a fraction of aligned bases (everything but blue). (C) 142 Cis-regulatory elements of 50 bp or less from literature; (D) compressed sampling of the 5′-proximal region every 50 bp from 50 to 500; (E) 50 bp proximal to the transcription start site (TS), aligned at TS; (F) genomic span of 5′-UTR, aligned at TS; (G) 5′-UTR span aligned at protein start site (PS); (H) 5′-end of protein-coding region aligned at PS; (I) 3′-end of coding exons aligned at donor site; (J) intron aligned at donor site; (K) introns aligned at acceptor; (L) 5′-end of internal coding exons aligned at acceptor site; (M) 3′-end of protein-coding region aligned at protein end site (PE); (N) 3′-UTR span aligned at PE; (O) 3′-UTR span aligned at transcript end; (P) 50 bp of 3′-proximal region aligned at transcript end; (Q) compressed sampling of 3′-proximal region every 50 bp from 50 to 500; and (R) genome-wide average.
