Base Composition and Hexamer Frequencies in the Last 30 Positions of Human mRNA 3‘ UTRs (UTRDB) and of Contigs Obtained from the 1000 Largest 3‘ EST Clusters
| UTRDB | Contigs from 1000 largest 3‘ EST clusters | |||||||
| No. of sequences | 2013 | 918 | ||||||
| %A | %U | %G | %C | %A | %U | %G | %C | |
| Base composition | 36.4 | 31.5 | 15.7 | 16.5 | 35.2 | 30.8 | 16.9 | 17.2 |
| 6-mer | N (%) | P | 6-mer | N (%) | P | |||
| Most significant hexamers | AAUAAA | 1187 (59.0) | 0 | AAUAAA | 442 (48.1) | 1 × 10−301 | ||
| AUUAAA | 239 (11.9) | 7 × 10−156 | AUUAAA | 100 (10.9) | 7 × 10−67 | |||
| AAAAAA | 41 (2.0) | 3 × 10−11 | CUGGGG | 13 (1.4) | 9 × 10−10 | |||
| AGUAAA | 26 (1.3) | 3 × 10−10 | AAAAAU | 23 (2.5) | 1 × 10−8 | |||
-
↵Contig sequences are read here in coding strand (i.e., strand other than the original 3‘ EST sequence).
-
↵UTRDB v. 4.1 (Pesole et al. 1996), file Hum_3utrnr.dat (6600 human sequences), retaining only sequences marked “complete,” longer than 30 nucleotides, and excluding 65 sequences still ending with an EcoRI restriction site.
-
↵After removal of contig sequences containing undefined nucleotides.
-
↵Based on P valuef.
-
↵Number of sequences containing this hexamer.
-
↵Probability of observing at least N sequences containing at least one copy of this hexamer in random sequences of same nucleotide composition.











