Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting

Table 1.

Motifs Found by Phylogenetic Footprinting

DNA region(a) Species(b) Motif (length) (position)(c) Score (species)(d) Ref.(e)
Metallothionein family 5′ UTR + promoter (590 bp) Human (Ia, Ih, II, IV), rat (I, II, III), mouse (III), hamster (I, II), sheep (Ia, II), rabbit (I), cow (I), frog (a), trout (a), pike, icefish (I, II), carp, loach, urchin (I, II), mussel, C. elegans (I, II)  1. GCTATAAAc (8) (Human II,−103)  2. CATGCGCAGg (9) (Rat III, −143)  3. cCGTGTGCAg (8) (Human II, −239)               CGTGTGCAggc (8) (Human II, −156)  4. TTTGCACACG (10) (Pike, −142)  5. tGCGCCCGG (8) (Human II, −222)               TGCACTCG (8) (Human II, −126)  6. TAACTGATAAA (10) (C. ele. I, −324)  7. TACACTCAG (9) (Rat III, −207)  8. TCCCACCAA (9) (Rat III, −497)  9. CAGGCACCT (9) (Rat III, −284) 10. TGCACACGG (9) (Human II, −374) 11. tGTACATTGTga (9) (C. elegans I, −129) 12. GCTTTAAAA (9) (Pike, −114) 2 (see Figure1) 2 9 (*) 9 (*) 4 5 4 0 1 1 1 1 2 0 1.1   1.2 1.3 1.4 1.5 1.6         1.7
Insulin family 5′ promoter (500 bp) Human, chimp,aotus, pig, rat (I, II), mouse (I, II)  1. gttAAGACTCTAAtgacc (10) (−223)  2. tcagcccccaGCCATCTGCC (10) (−122)  3. CTATAAAGcc (8) (−32)  4. GGGAAATG (8) (−145) 0 (Mutated in rodents (I)) 1 0 0 (Absent from rodents) 2.1 2.2 2.3 2.4
c-myc  5′ promoter  (1000 bp) Goldfish, frog, chicken, rat, pig, marmoset, human  1. aGTTTATTC (8) (−611)  2. TTGCTGGG (8) (−570)  3. GGCGCGCAGT (10) (−359)  4. CAGCTGTTCCgc (10) (−325)  5. TGTTTACATCc (10) (−173)  6. ccaCCCTCCCC (8) (−105)  7. AGCAGAGGGCG (10) (−69)  8. GGCGTGGG (8) (−62)  9. ATCTCCGCCCAcc (8) (−26) 1 (Absent from goldfish) 3 (Absent from chicken) 2 (Chicken + mammals) 2 (Chicken + mammals) 2 (Chicken + mammals) 4 2 (Chicken + mammals) 2 (Absent from goldfish) 2 (Chicken + mammals)         3.1 3.2 3.3 3.4
c-myc second intron (971 to 1376 bp) Chicken, pig, rat, marmoset, gibbon, human  1. CATTTTAATT (10) (303)  2. TGAATGAATT (10) (375)  3. tTTTGAACACT (10) (542)  4. TAGGGAGTTG (10) (670)  5. ATTTGCAGCTat (10) (698)  6. GAAGTGTTCT (10) (725)  7. TTGGTAAAGT (10) (733)  8. GCTTTGCTTTGGGTGTGT (10) (780)  9. GCCTCATTAAGTCTTAGGTAAG (10) (795) 10. TTCCTTTCTT (10) (1362) 0 (Mammals) 0 (Mammals) 0 (Mammals) 2 2 2 0 (Mammals) 0 (Mammals) 0 (Mammals) 2                   4.1
c-fos 5′ UTR + promoter (800 bp) Tetraodon, chicken, mouse, hamster, pig, human  1. CAGGTGCGAATGTTC (10) (−615)  2. TTCCCGCCTCCCCTCCCC (10) (−583)  3. GAGTTGGCTGcagcc (10) (−527)  4. GTTCCCGTCAATCcct (10) (−504)  5. CACAGGATGTcc (10) (−479)  6. AGGACATCTG (10) (−462)  7. GTCAGCAGGTTTCCACG (10) (−439)  8. TACTCCAACCGC (10) (−159) 0 (Mammals) 0 (Mammals) 3 (Tetraodon + mammals) 1 (Chicken + mammals) 4 1 (Chicken + mammals) 0 (Mammals) 0 (Mammals)   5.5   5.1 5.2 5.3 5.4
c-fos first intron (376 to 758 bp) Fugu, tetraodon,chicken, pig, mouse, hamster, human  1. GGGTGTGTAAgg(10) (404)  2.  GTTTCATTGATAAAAAGCGAGTTCATTCT GGAGACTCCGGAGCGGCG(10) (417)  3. agcgcagacgtcAGGGATATTTA (10) (472) 3 (Absent from fugu)   1 (Absent from fishes) 1 6.1   6.1 6.1
Growth-hormone 5′ UTR + promoter (380 bp) Salmon, trout, white fish, seriola, lates, tilapia, fugu, grass carp, catfish, chicken, rat, mouse, dog, sheep, goat, human  1. GGGAGGAG (8) (−198)  2. ATTATCCAT (9) (−183)  3. TTAGCACAA (9) (−174)  4. GTCAGTGG (8) (−162)  5. gcATAAATGTA (9) (−146)  6. GAAACAGGT (9) (−131)  7. cagggTATAAAAAGggc (9) (−97)  8. TCATGTTTt (9) (Salmon, −138) 3 (Chicken + mammals) 1 (Mammals) 3 (Human, rodents, chicken) 3 (Chicken + mammals) 2 (Chicken + mammals) 1 (Human, rodents, salmonida) 6 (Absent from catfish) 2 (Fishes, except catfish, trout) 7.1 7.2   7.3 7.4   7.5
Interleukin-3 5′ UTR + promoter (490 bp) Rat, mouse, cow, sheep, human, macaca  1. TTGAGTACTagaaagt (8) (−228)  2. GATGAATAATt (8) (−208)  3.  GTCTGTGGTTTtCTATGGAGGTTCCATGT CAGATAAAG(8) (−195)  4. TCTTCAGAGc (8) (−56)  5. AGGACCAG (8) (−40) 1 1   0 1 1   8.1   8.2
Histone H1 5′ UTR + promoter (650 bp) Chicken, duck, frog, mouse  1. CAATCACCAC (10) (Mouse, −107)  2. gAAACAAAAGTtt (10) (Mouse, −427) 3 1 9.1
  • DNA regions considered.

  • Species (and isoforms) considered.

  • Highly conserved motifs found by FootPrinter. Overlapping motifs reported by FootPrinter have been merged, but all nucleotides of the motifs in this Table belong to at least one solution of the given length and with a parsimony score matching our statistical significance threshold. (See Methods.) Capitalization is only relevant with respect to column d. The sequences and positions reported are those for the human sequences, except where otherwise noted. Negative positions are measured in the 5′ direction from the start codon, and positive positions in the 3′ direction from the 5′ end of the intron. A few conserved regions found by FootPrinter that are of low complexity or otherwise uninteresting are not reported.

  • Parsimony score of the capitalized motif in the subset of species listed. The capitalized region is that with the lowest parsimony score. When no subset is mentioned, the motif is found in all sequences. FootPrinter identified motifs marked by an asterisk in several subsets of the species where shown in Fig. 1, but not in the whole set of species. These subsets were merged by hand to produce Fig. 1. The parsimony score given is that for the whole set of species.

  • Known functional information about the motif. Unless otherwise noted, the information comes from TRANSFAC (Wingender et al. 1996), with accession number in brackets. Metallothionein:1.1 TATA-box [R03173], 1.2 MREe [R08295], 1.3 MREb [R08294], 1.4 MREa [R01816], 1.5 MREa [R08293], 1.6 MREd [R08298], and 1.7 MREg [R08296]. Insulin: 2.1 CT-II [R02709], 2.2 IEB1 [R04457], 2.3 TATA-box [GenBank annotation], and 2.4 GG-II [R02711]. C-myc: 3.1 Near NRE [R02571], 3.2 NHE [R01804], 3.3 P1 promoter [R04076], and 3.4 TCE [R04076].C-myc second intron: 4.1 Part of 3′ splice site.C-fos: 5.1 SIF-E, SIE [R00458, R08485], 5.2 [many factors bind in this region; R00466, R00465, R00464, R01889], 5.3 [many factors bind in this region; R00467, R00463, R04047, R04046,R00462, R00461], 5.4 [part of DSE in SRE; R00467], 5.5 MatInspector (Quandt et al. 1995) hit: MTZ1 (Myeloid zinc factor 1). C-fos first intron: 6.1 (Transcription elongation signals; Mechti et al. 1991), Motif 3 contains a CREB binding site (Lange and Bading 2001). Growth hormone: 7.1 GHF-2 [R02050 in rat], 7.2 dGHF-1 [R00611], 7.3 [R04639 in rat], 7.4 pGHF-1 [R00612], and 7.5 nT3RE [R03959 in rat]. IL-3: 8.1 [R02736], 8.2 [R02682, R05026, R05027]. Histone H1: 9.1 CAAT signal [GenBank annotation].

This Article

  1. Genome Res. 12: 739-748

Preprint Server