The Use of MPSS for Whole-Genome Transcriptional Analysis in Arabidopsis

Table 4.

Abundance of Four-Base Words in Expressed Signatures


Rank (by ratio)a

Wordb

N2 (2-step)c

N4 (4-step)

N2:N4 Adj. ratiod
A. Word use in 17-base signatures for which the 2-step abundance was significantly greater than 4-step abundance. This subset of expressed signatures is described as “Bin 2” in the Methods section.
1 GGCC 1 232 0.0043
2 TTAA 6 846 0.0071
3 TATA 3 415 0.0072
4 TCGA 7 757 0.0092
5 AGCT 11 1120 0.0098
6 CGCG 2 155 0.0129
7 TGCA 13 819 0.0159
8 CATG 15 747 0.0201
9 ATAT 18 762 0.0236
10 GCGC 3 106 0.0283
11 CCGG 13 401 0.0324
12 AATT 33 887 0.0372
13 GTAC 12 317 0.0379
14 ACGT 12 291 0.0412
15 CTAG 30 341 0.0880
16 TTTA 111 788 0.1409
17 TAAA 127 705 0.1801
18 TTGA 202 1034 0.1954
19 TAGA 81 412 0.1966
20 TCAC 138 436 0.3165
21 TCGG 89 249 0.3574
22 TCAA 311 799 0.3892
23 TAAG 171 403 0.4243
24 TGAC 154 352 0.4375
25 TCCG 98 221 0.4434
[see Supplemental Figures S4 to S6 for complete tables]
246 GAAT 484 172 2.8140
247 CGTA 186 65 2.8615
248 GCTT 490 171 2.8655
249 GATA 385 131 2.9389
250 ACAT 488 159 3.0692
251 GCTA 307 85 3.6118
252 ACTC 495 136 3.6397
253 ACTT 600 162 3.7037
254 CCTA 265 70 3.7857
255 ACTA 365 89 4.1011
B. Word use in 17-base signatures for which the 4-step abundance was significantly greater than 2-step abundance.
1 GCGC 189 1 0.0053
2 TTAA 837 10 0.0119
3 AGCT 1359 21 0.0155
4 TATA 407 9 0.0221
5 GGCC 321 9 0.0280
6 TCGA 782 22 0.0281
7 TGCA 886 27 0.0305
8 CCGG 449 19 0.0423
9 CATG 729 31 0.0425
10 ATAT 726 33 0.0455
11 GTAC 405 21 0.0519
12 CGCG 196 11 0.0561
13 ACGT 366 21 0.0574
14 AATT 879 52 0.0592
15 CTAG 396 32 0.0808
16 TAAA 750 111 0.1480
17 TTTA 718 112 0.1560
18 TAGA 546 95 0.1740
19 TTGA 1100 213 0.1936
20 TCAC 502 209 0.4163
21 TGGC 381 161 0.4226
22 AGCC 380 166 0.4368
23 TGAC 414 184 0.4444
24 TTAG 447 200 0.4474
25 GCGG 184 83 0.4511
[see Supplemental Figures S7 to S9 for complete tables]
246 AAAG 443 1151 2.5982
247 GCTT 227 594 2.6167
248 CATT 229 605 2.6419
249 CATA 157 429 2.7325
250 CCTT 179 504 2.8156
251 ACTC 185 523 2.8270
252 ACAT 183 647 3.5355
253 ACTT 205 744 3.6293
254 ACTA 111 444 4.0000
255
CCTA
60
265
4.4167
  • a For brevity, only the first 25 and last 10 rows of 255 four-base words are shown; GATC was not considered because it is rarely observed among expressed signatures. For the complete set of data corresponding to this subset of signatures, the other “bins”, and the 20-base expressed signatures, see Supplemental Figures S4-S9.

  • b Palindromic words are shown in bold; other “bad” words are indicated in italics. Frame 1 (see Fig. 5A) was not considered because only the 16 words initiating with “TC” can be observed in this frame.

  • c “N” indicates the frequency of occurrence of the word among the frames of the expressed signatures for either of the indicated steppers. The frequency of the words was calculated with all expressed signatures considered equally, independent of the expression abundance.

  • d “Adj. ratio” indicates that the ratio was adjusted to account for the different number of frames in the 2- and 4-step reactions for which the word frequencies were counted; for the 17-base expressed signatures, words in 2-step frames 3 and 5 were counted and 4-step frames 2, 4, and 6 (Fig. 5A). Therefore, frequency counts for the 4-step words were adjusted by 2/3 prior to calculating the ratio.

This Article

  1. Genome Res. 14: 1641-1653

Preprint Server