Overrepresented Four-Letter Words in the Proximity of Indels (Nucleotides in the Range 1—4 From the Indel)
|
|
All indels |
Insertions |
Deletions |
Slippage-like |
Non-slippage-like |
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Worda |
Obs |
Sim |
Obs |
Sim |
Obs |
Sim |
Obs |
Sim |
Obs |
Sim |
|||||
| GCGG3 | 25 | 8.3 | 15 | 3.5 | 10 | 4.8 | 11 | 2.9 | 6 | 2.9 | |||||
| GGCG3 | 19 | 7.1 | 11 | 3.1 | 8 | 4.0 | 9 | 2.4 | 6 | 2.4 | |||||
| GCCG3 | 23 | 8.7 | 16 | 3.6 | 7 | 5.1 | 11 | 3.0 | 4 | 3.0 | |||||
| GAGG2 | 62 | 24.1 | 33 | 10.4 | 29 | 13.7 | 31 | 8.5 | 9 | 8.0 | |||||
| CAGC1 | 80 | 31.6 | 52 | 13.6 | 28 | 17.9 | 42 | 11.2 | 24 | 10.5 | |||||
| CGGC3 | 23 | 9.3 | 13 | 3.9 | 10 | 5.4 | 12 | 3.1 | 6 | 3.2 | |||||
| CGCC3 | 20 | 9.0 | 14 | 3.8 | 6 | 5.2 | 12 | 3.2 | 5 | 3.0 | |||||
| ACAA | 34 | 15.4 | 11 | 6.5 | 23 | 8.9 | 14 | 5.3 | 13 | 5.3 | |||||
| AGAA2 | 64 | 30.0 | 25 | 13.0 | 39 | 17.0 | 34 | 10.6 | 10 | 10.1 | |||||
| CCTC2 | 44 | 20.9 | 21 | 9.1 | 23 | 11.8 | 22 | 7.4 | 10 | 7.1 | |||||
| CCGC3 | 18 | 8.5 | 13 | 3.6 | 5 | 4.9 | 7 | 2.9 | 4 | 2.9 | |||||
| AGCA1 | 47 | 22.6 | 25 | 9.9 | 22 | 12.7 | 26 | 8.1 | 11 | 7.5 | |||||
| CCCC2 | 35 | 17.0 | 18 | 7.2 | 17 | 9.8 | 14 | 5.8 | 12 | 5.8 | |||||
| GACG3 | 14 | 6.9 | 11 | 2.9 | 3 | 4.0 | 3 | 2.4 | 2 | 2.4 | |||||
| GCAG1 | 56 | 28.7 | 37 | 12.3 | 19 | 16.4 | 28 | 10.1 | 11 | 9.6 | |||||
| CCAC | 35 | 19.1 | 21 | 8.3 | 14 | 10.8 | 14 | 6.7 | 5 | 6.4 | |||||
| AAGA2
|
47
|
27.6
|
17
|
11.8
|
30
|
15.8
|
25
|
9.6
|
8
|
9.4
|
|||||
-
Obs indicates observed frequency counts and Sim the mean frequency counts from 1000 simulated data sets. The complete version of this table (256 rows), showing all four-letter words and additional columns detailing standard deviations and p-values, is available as Supplemental Table S2 (http://www.genome.org). This table shows all rows of the complete table containing significant (p < 0.01) differences between observed and simulated data, indicated by values in bold. Rows are sorted by descending fold overrepresentation for all indels.
-
↵a Several of the nucleotide words are related. These are indicated by superscript values:
-
↵(1) possible permutation of the trinucleotide CAG;
-
↵(2) oligo-purine or oligo-pyrimidine tract;
-
↵(3) high G+C content words containing a CpG dinucleotide











