Occurrence and Consequences of Coding Sequence Insertions and Deletions in Mammalian Genomes

Table 3.

Overrepresented Four-Letter Words in the Proximity of Indels (Nucleotides in the Range 1—4 From the Indel)



All indels

Insertions

Deletions

Slippage-like

Non-slippage-like
Worda
Obs
Sim
Obs
Sim
Obs
Sim
Obs
Sim
Obs
Sim
GCGG3 25 8.3 15 3.5 10 4.8 11 2.9 6 2.9
GGCG3 19 7.1 11 3.1 8 4.0 9 2.4 6 2.4
GCCG3 23 8.7 16 3.6 7 5.1 11 3.0 4 3.0
GAGG2 62 24.1 33 10.4 29 13.7 31 8.5 9 8.0
CAGC1 80 31.6 52 13.6 28 17.9 42 11.2 24 10.5
CGGC3 23 9.3 13 3.9 10 5.4 12 3.1 6 3.2
CGCC3 20 9.0 14 3.8 6 5.2 12 3.2 5 3.0
ACAA 34 15.4 11 6.5 23 8.9 14 5.3 13 5.3
AGAA2 64 30.0 25 13.0 39 17.0 34 10.6 10 10.1
CCTC2 44 20.9 21 9.1 23 11.8 22 7.4 10 7.1
CCGC3 18 8.5 13 3.6 5 4.9 7 2.9 4 2.9
AGCA1 47 22.6 25 9.9 22 12.7 26 8.1 11 7.5
CCCC2 35 17.0 18 7.2 17 9.8 14 5.8 12 5.8
GACG3 14 6.9 11 2.9 3 4.0 3 2.4 2 2.4
GCAG1 56 28.7 37 12.3 19 16.4 28 10.1 11 9.6
CCAC 35 19.1 21 8.3 14 10.8 14 6.7 5 6.4
AAGA2
47
27.6
17
11.8
30
15.8
25
9.6
8
9.4
  • Obs indicates observed frequency counts and Sim the mean frequency counts from 1000 simulated data sets. The complete version of this table (256 rows), showing all four-letter words and additional columns detailing standard deviations and p-values, is available as Supplemental Table S2 (http://www.genome.org). This table shows all rows of the complete table containing significant (p < 0.01) differences between observed and simulated data, indicated by values in bold. Rows are sorted by descending fold overrepresentation for all indels.

  • a Several of the nucleotide words are related. These are indicated by superscript values:

  • (1) possible permutation of the trinucleotide CAG;

  • (2) oligo-purine or oligo-pyrimidine tract;

  • (3) high G+C content words containing a CpG dinucleotide

This Article

  1. Genome Res. 14: 555-566

Preprint Server