Alternate Polyadenylation in Human mRNAs: A Large-Scale Analysis by EST Clustering

Table 3.

Base Composition and Hexamer Frequencies in the Last 30 Positions of Human mRNA 3‘ UTRs (UTRDB) and of Contigs Obtained from the 1000 Largest 3‘ EST Clusters

UTRDB Contigs from 1000 largest 3‘ EST clusters
No. of sequences 2013 918
%A %U %G %C %A %U %G %C
Base composition 36.4 31.5 15.7 16.5 35.2 30.8 16.9 17.2
6-mer N  (%) P 6-mer N  (%) P
Most significant hexamers AAUAAA 1187 (59.0) 0 AAUAAA 442 (48.1) 1 × 10−301
AUUAAA 239 (11.9) 7 × 10−156 AUUAAA 100 (10.9) 7 × 10−67
AAAAAA 41 (2.0) 3 × 10−11 CUGGGG 13 (1.4) 9 × 10−10
AGUAAA 26 (1.3) 3 × 10−10 AAAAAU 23 (2.5) 1 × 10−8
  • Contig sequences are read here in coding strand (i.e., strand other than the original 3‘ EST sequence).

  • UTRDB v. 4.1 (Pesole et al. 1996), file Hum_3utrnr.dat (6600 human sequences), retaining only sequences marked “complete,” longer than 30 nucleotides, and excluding 65 sequences still ending with an EcoRI restriction site.

  • After removal of contig sequences containing undefined nucleotides.

  • Based on P valuef.

  • Number of sequences containing this hexamer.

  • Probability of observing at least N sequences containing at least one copy of this hexamer in random sequences of same nucleotide composition.

This Article

  1. Genome Res. 8: 524-530

Preprint Server