Word frequency analysis reveals enrichment of dinucleotide repeats on the human X chromosome and [GATA]n in the X escape region

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

Distribution of word frequencies in the genome. The x-axis represents the frequency of word pairs in the genome, and the y-axis is the number of word pairs that occur at that frequency. The highest peak is largely populated by complex words that contain no CpGs. Words containing two and one CpGs, respectively, populate the first two peaks. The rarest words in the left tail have three or four CpGs, while the shoulder on the right tail is composed of simple sequence, largely mono- and dinucleotide repeats (see arrow).

This Article

  1. Genome Res. 16: 477-484

Preprint Server