Table 3.

Base Composition and Hexamer Frequencies in the Last 30 Positions of Human mRNA 3‘ UTRs (UTRDB) and of Contigs Obtained from the 1000 Largest 3‘ EST Clusters

UTRDB Contigs from 1000 largest 3‘ EST clusters
No. of sequences 2013[ii] 918[iii]
%A %U %G %C %A %U %G %C
Base composition[i] 36.431.515.716.535.230.816.917.2
6-mer[i] N [v] (%) P [vi] 6-mer[i] N [v] (%) P [vi]
Most significant hexamers[iv] AAUAAA1187 (59.0)0AAUAAA442 (48.1)1 × 10−301
AUUAAA239 (11.9)7 × 10−156 AUUAAA100 (10.9)7 × 10−67
AAAAAA41 (2.0)3 × 10−11 CUGGGG13 (1.4)9 × 10−10
AGUAAA26 (1.3)3 × 10−10 AAAAAU23 (2.5)1 × 10−8

[i] Contig sequences are read here in coding strand (i.e., strand other than the original 3‘ EST sequence).

[ii] UTRDB v. 4.1 (Pesole et al. 1996), file Hum_3utrnr.dat (6600 human sequences), retaining only sequences marked “complete,” longer than 30 nucleotides, and excluding 65 sequences still ending with an EcoRI restriction site.

[iii] After removal of contig sequences containing undefined nucleotides.

[iv] Based on P valuef.

[v] Number of sequences containing this hexamer.

[vi] Probability of observing at least N sequences containing at least one copy of this hexamer in random sequences of same nucleotide composition.