Raising the estimate of functional human sequences

Michael Pheasant; John S. Mattick

doi:10.1101/gr.6406307

Raising the estimate of functional human sequences

Michael Pheasant and
John S. Mattick1

ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland 4072, Australia

Next Section

Abstract

While less than 1.5% of the mammalian genome encodes proteins, it is now evident that the vast majority is transcribed, mainly into non-protein-coding RNAs. This raises the question of what fraction of the genome is functional, i.e., composed of sequences that yield functional products, are required for the expression (regulation or processing) of these products, or are required for chromosome replication and maintenance. Many of the observed noncoding transcripts are differentially expressed, and, while most have not yet been studied, increasing numbers are being shown to be functional and/or trafficked to specific subcellular locations, as well as exhibit subtle evidence of selection. On the other hand, analyses of conservation patterns indicate that only ∼5% (3%–8%) of the human genome is under purifying selection for functions common to mammals. However, these estimates rely on the assumption that reference sequences (usually ancient transposon-derived sequences) have evolved neutrally, which may not be the case, and if so would lead to an underestimate of the fraction of the genome under evolutionary constraint. These analyses also do not detect functional sequences that are evolving rapidly and/or have acquired lineage-specific functions. Indeed, many regulatory sequences and known functional noncoding RNAs, including many microRNAs, are not conserved over significant evolutionary distances, and recent evidence from the ENCODE project suggests that many functional elements show no detectable level of sequence constraint. Thus, it is likely that much more than 5% of the genome encodes functional information, and although the upper bound is unknown, it may be considerably higher than currently thought.

Only a tiny fraction of the human genome is currently recognized to encode functional products, mainly mRNAs (∼2.2%) (Frith et al. 2005) plus a limited number of structural and regulatory RNAs, including microRNAs and other non-protein-coding RNAs (Mattick and Makunin 2006). Perplexingly, the currently estimated number of human protein-coding genes (∼20,000) (International Human Genome Sequencing Consortium 2004; Goodstadt and Ponting 2006) is similar to those of the sea urchin (∼23,000) (Sea Urchin Genome Sequencing Consortium 2006) and the nematode worm (∼19,000) (Stein et al. 2003), and substantially less than that of the protist Tetrahymena thermophila (∼27,000) (Eisen et al. 2006), despite enormous differences in their developmental complexity. Thus, it is unclear where the information that programs human development resides and how it is different from that of simpler organisms.

Part of the answer to this conundrum lies in the use of alternative splicing by complex organisms to expand the diversity of their proteomes (Xing and Lee 2006), although this requires a concomitant increase in regulatory information. In contrast to microorganisms, multicellular eukaryotes have extensive intronic and intergenic sequences whose extent broadly increases with developmental complexity (Taft et al. 2007). Thus it is possible that the non-protein-coding sequences in mammalian genomes contain large amounts of regulatory information used to program the complexities of mammalian development, including tetrapod body plan, placental development, and a highly developed brain, particularly in humans (Mattick 2007; Taft et al. 2007). This possibility is made all the more intriguing by the recent discovery that the vast majority of the mammalian genome is transcribed, apparently in a developmentally regulated manner (see below).

However, while making some allowance for regulatory elements, and on the expectation that most genetic information is transacted by proteins, these extensive non-protein-coding sequences in humans and other mammals have been generally assumed to be nonfunctional, and mostly evolving without constraint, even though the fraction of noncoding sequences that are genetically inert is uncertain. Here we reassess the evidence concerning the amount of the human genome that is functional and under selection. We define functional sequences as those that (1) are required for replication and structural integrity of the chromosome, (2) encode functional products (RNAs and derived proteins), or (3) are required for the correct four-dimensional expression (regulation or processing) of these products during ontogeny and homeostasis. These include sequences that may act as required spacers, for example, between domains in proteins or RNAs, or in promoters, whose exact sequence may not be critical but that have a role in the functionality of the entity as a whole.

First, we review the amount and likely function of the transcriptional output of the genome. Second, bearing in mind that sequence conservation imputes function but is by definition a relative measure, we show that estimates of the extent of the genome that may be evolving “neutrally” (i.e., without obvious constraint, and by implication nonfunctional) are dependent on background assumptions of the nonfunctionality of certain classes of sequences, which may be questioned. Third, following from this, we suggest that the fraction of the genome under purifying selection may have been underestimated due to underestimation of the neutral rate of evolution. Finally, we show that experimentally validated gene regulatory sequences and functional noncoding RNAs are evolving at quite variable rates, often relatively quickly compared to sequences encoding proteins, presumably reflecting different structure–function constraints and different selection pressures. Since such sequences may not be included among those exhibiting detectable evolutionary constraint, and given the uncertainties in the measurement of the latter, it is possible that a considerable fraction of the human genome may be functional.

Previous Section Next Section

Transcriptional output of the genome

Recent cDNA and genome tiling array transcriptome analyses have revealed that at least 70% of the mammalian genome is transcribed, and possibly 60% of transcribed regions show evidence for transcription from both strands, in extremely complicated patterns of interlaced and overlapping transcripts, thousands of which are not polyadenylated (Katayama et al. 2005; Carninci 2007; Gerstein et al. 2007; Gingeras 2007). These observations have been reinforced by the recent detailed studies of the ENCODE regions of the human genome, which showed that 93% of bases in these regions appear in a primary transcript with at least two independent observations and 74% are detected by at least two different technologies (The ENCODE Project Consortium 2007). Hundreds of these intergenic, intronic, and antisense non-protein-coding transcripts show cell-specific or developmental regulation (Carninci et al. 2005; Cheng et al. 2005; Katayama et al. 2005; Ravasi et al. 2006) which may be extrapolated to thousands (Peters et al. 2007), and in the individual cases that have been examined in more detail, specific subcellular locations and functions (Prasanth et al. 2005; Willingham et al. 2005; Ginger et al. 2006; Ishii et al. 2006; for a recent review, see Mattick and Makunin 2006), all of which may indicate function. It is also now known that all snoRNAs and one-third to one-half of microRNAs in mammals are encoded within introns (Rodriguez et al. 2004; Baskerville and Bartel 2005; for review, see Mattick and Makunin 2005).

However, most of the tens of thousands of documented noncoding transcripts in mammals have not yet been studied, and it remains an open question whether they are functional or not. It has been suggested that many of these transcripts may be cell-type-specific transcriptional noise or by-products (“neutral transcription”), which may provide a reservoir for future evolution (Brosius 2005), or biochemically functional but selectively neutral transcripts with no significant advantage or disadvantage for the organism (The ENCODE Project Consortium 2007). On the other hand, recent evidence strongly implicates noncoding RNAs in the control of chromatin architecture and epigenetic memory (Andersen and Panning 2003; Bernstein and Allis 2005; Sanchez-Elsner et al. 2006; Schmitt and Paro 2006; Rinn et al. 2007), transcription (Janowski et al. 2005; Goodrich and Kugel 2006; Kim et al. 2006; Li et al. 2006; Martianov et al. 2007; Pagano et al. 2007), translation (Bartel 2004; Mattick and Makunin 2005), and possibly splicing (Mattick and Makunin 2006). Indeed, although most non-protein-coding RNAs (ncRNAs) with evidence for function are evolving quickly, they do retain more highly conserved patches within them (∼600 long ncRNAs investigated in human and mouse) (Pang et al. 2006) and 3122 other long ncRNAs show subtle evidence of selection (Ponjavic et al. 2007).

Previous Section Next Section

Genome-wide estimates of function from conservation

Initial comparison of the mouse and human genomes led to the conclusion that ∼5% of small (50–100 bp) segments are under purifying selection for biological functions common to both species (more specifically, ∼20% of all human–mouse aligned segments) (Waterston et al. 2002), a surprisingly high figure at the time as only ∼1.2% of the human genome is protein-coding (Frith et al. 2005). It is important to note that Waterston et al. did not claim that this was the full extent of functional sequence in the genome as it does not include lineage-specific sequences (including transposon-derived sequences) that have diverged and/or been exapted during adaptive radiation or conserved specifically since the divergence of rodents and primates. Comparative analyses of mammals that are widely separated in evolution have insufficient power to detect lineage-specific elements or elements in species that are evolutionarily “too close,” such as those elements that became functional in our ancestral primate lineage (Stone et al. 2005). The initial estimate of the conserved fraction of the genome was also dependent on various parameters including the window size used for the analysis (Stone et al. 2005), and ranged from 3% to 8%. The latter corresponds to 40% of all aligned sequences, even though these alignments only included 83% of RefSeq annotated genes (Waterston et al. 2002; Chiaromonte et al. 2003; Roskin et al. 2003).

Subsequent studies seeking to identify the particular segments under selection report similar results, including the most recent finding that 5% of bases are confidently predicted as being under evolutionary constraint in mammals by two out of three algorithms employed in the ENCODE project analysis (The ENCODE Project Consortium 2007). However, since conservation is relative, all of these methods require an estimate of the underlying neutral rate of evolution, generally taken to be the substitution rate measured from some class of sequence that is expected to be evolving free of constraint, with the implicit additional assumption that there are not many functional sequences that have evolved at a net rate that is statistically indistinguishable from the estimated neutral rate (Stone et al. 2005).

Classes of sequence used to estimate the neutral rate of substitutions have included lineage-specific nonexonic sequences (Cooper et al. 2003, 2004, 2005), synonymous sites in codons (fourfold degenerate sites or 4-D sites) (Cooper et al. 2003; Margulies et al. 2003), and alignable ancestral transposon-derived sequences (ancient repeats or ARs) (Waterston et al. 2002; Chiaromonte et al. 2003; Margulies et al. 2003; Roskin et al. 2003; Gaffney and Keightley 2006), none of which is unbiased (see below). Indeed the true rate of neutral sequence drift may never have actually been measured for lack of identifying functionally completely unconstrained sectors of DNA (Zuckerkandl 1992).

Lineage-specific nonexonic sequences present in two closely related species and absent from a third more distant species have been assumed to be neutrally evolving although they will include some fraction of functionally constrained sequence (Frazer et al. 2004). Moreover, extrapolation of the measured substitution frequencies to more distantly related species is problematic and results in varying estimates of the pan-mammalian neutral rate (∼1.5-fold difference; Cooper et al. 2005).

Synonymous sites in codons, often thought to be fully redundant, can apparently encode subtle additional information. The genetic code has been shown to be almost optimal to encode such additional information, such as binding sequences, splicing signals, and RNA secondary structure (Bollenbach et al. 2007; Itzkovitz and Alon 2007). Synonymous sites can encode splicing regulatory information, and a high proportion of studied mutations produce a splicing defect (Pagani et al. 2005), which is another type of constraint, and may be a frequent cause of hereditary disease (Chamary et al. 2006; Xing and Lee 2006). They can also encode protein structural information (Kimchi-Sarfaty et al. 2007; Komar 2007). These conclusions are also supported by genome-wide evolutionary studies. The rate of synonymous substitutions is 1.8-fold lower in alternative compared to constitutive exons between human and mouse (Xing and Lee 2005). There are 200 (and up to ∼1600) regions of extreme selection on synonymous codons in 11,786 pairs of homologous human and mouse genes (Schattner and Diekhans 2006). Comparison between protein-coding and intergenic regions in human and chimp indicate that ∼39% of synonymous sites are deleterious and subject to negative selection (Hellmann et al. 2003). Analysis of deep mammalian alignments within ENCODE regions may detect many more regions under weaker purifying selection with greater statistical power than possible with single pairwise analyses, but this has yet to be done. However, mounting evidence for functional selection and deleterious effects of mutations suggests that the assumption of neutrality of synonymous sites can no longer be maintained, and that it is possible the neutral rate cannot reliably be extracted from any sequence comparison (Chamary et al. 2006).

Previous Section Next Section

Uncertainty in the estimates of selection

The original estimate of 5% of the genome under selection for functions common to mammals is largely based on estimates of the neutral rate of evolution measured from ancient repeats. However, estimates based on ARs may be biased in two ways, although the extent of such bias is unknown: (1) the annotated and aligned ARs may comprise a slowly evolving subset of the distribution of all ARs, since the most rapidly evolving ones may have diverged to the extent of being unrecognizable or unalignable, and (2) some ARs are under, or have been subject to, purifying selection. If the fraction of ARs in either category is large, then the use of ARs as a neutral model will result in a significant underestimate of the true neutral rate and hence the fraction of the genome under selection. A third possibility is that some ARs are subject to positive selection pressures and are evolving faster than the neutral rate, leading to an overestimate of the fraction of the genome under purifying selection if significant numbers have not diverged beyond recognition. In this case, however, there will be underestimation of that fraction of the genome that encodes lineage-specific functions.

The evidence supporting the possibility of bias in the estimation of the neutral rate of evolution is as follows: First, it is evident that many ARs in mammalian genomes have diverged to the limit of detection, suggesting significant numbers are beyond recognition and cannot be identified (Smit and Riggs 1995; Smit 1999; Silva et al. 2003) (numbers are difficult to estimate, but the limit of detection is ∼30% divergence from the consensus and is particularly problematic in mouse; Waterston et al. 2002). The ancestral mammalian genome is estimated at ∼2.8 Gb and extant ancestral sequences in human ∼2.2 Gb, but only ∼152 Mb of ARs are alignable with both mouse and dog (although 200 Mb is alignable with mouse and 372 Mb with dog) (Lindblad-Toh et al. 2005), and these ARs can only be traced back ∼120 Myr (Waterston et al. 2002). Comparisons of alignment algorithms in ENCODE regions using sequences from 28 vertebrates including 14 mammals show that less than half of identified ARs are alignable, ranging from 24% to 47% depending on the algorithm employed (Margulies et al. 2007). These analyses also concluded that the measured substitution rate in ARs varies more between alignment algorithms than it does regionally in aligned sequences by any one alignment algorithm and that “the ‘true’ neutral rate for any given region of the human genome is thus only estimable given some nontrivial technical uncertainty” (Margulies et al. 2007). Thus, the large amount of ancestral sequences, particularly those that are unaligned, almost certainly includes many other AR-derived sequences that are unrecognized due their divergence (see, e.g., Mikkelsen et al. 2007), which, if so, will introduce a significant error in the estimate of the neutral rate, as only the more conserved fraction is being measured.

Second, the recent analysis of the opossum genome showed that 14% of all the most highly conserved noncoding elements (CNEs) and 16% of the eutherian-specific CNEs are derived from ARs (Mikkelsen et al. 2007). Thousands of fragments of ARs of all classes constitute at least 5.5% of the non-exonic mammalian conserved sequences and are often more highly conserved than those encoding proteins (Cooper et al. 2005; Siepel et al. 2005; Kamal et al. 2006; Lowe et al. 2007). Substitution rates are also significantly different between different classes of ARs, as well as between ARs of different age groups within a particular class (Waterston et al. 2002; Ganapathi et al. 2005; Gaffney and Keightley 2006; Pace and Feschotte 2007; Shankar et al. 2007), indicating that these sequences are evolving differently. Mammalian-wide interspersed repeats (MIRs), of which there are ∼300,000 copies in the genome (2% of the genome) and date back ∼130 Myr, have a lower than expected divergence from the mammalian MIR consensus, and the divergence is similar in both human and mouse even though neutrally evolving ARs should be twice as divergent in mouse than their human homologs, suggesting they are subject to selection (Silva et al. 2003). These elements have a 70-nt central region that is more highly conserved in the genome, and a 15- to 25-nt more highly conserved core within this, the most likely explanation being selection for function (Smit and Riggs 1995; Silva et al. 2003). Alu elements also have a core region conserved in mammals (Jelinek et al. 1980). While transposon-derived sequences (transposable elements or TEs) comprise 40%–60% of poorly conserved regions and have no identifiable ortholog, ∼20% of conserved regions are composed of TEs that do have orthologs, suggesting selection of this subset. For example, MIR and L2 elements are twofold enriched in conserved regions, and >75% of murine MIR and L2 elements have human orthologs. Therefore, these elements must be ancestral repeats under negative selection, which suggests that the exaptation of MIR and L2-derived sequences may be common (Silva et al. 2003).

Previous Section Next Section

Evidence for functional exaptation of transposon-derived sequences

There are increasing numbers of transposon-derived sequences of all classes, both ancient and modern, including lineage-specific repeats, that have been shown to have undergone functional exaptation (Brosius 1999; Volff 2006) (also referred to as exaption, co-option, recruitment, or domestication; Silva et al. 2003). There is longstanding evidence that transposons and their derived sequences can significantly influence the information content and output of the genome (Baltimore 1985; Finnegan 1989; Oei et al. 2004). They have been shown to play important roles in early development (Peaston et al. 2004) and phenotypic variation (Whitelaw and Martin 2001). AR sequences can introduce new splice sites, protein domains, stop codons, and other sequences and can split genes, leading to the birth of new genes or alternative isoforms (Smit 1999; Lev-Maor et al. 2003; Yi et al. 2003; Dagan et al. 2004; Brandt et al. 2005a, b; Krull et al. 2005; Wheelan et al. 2005; Bejerano et al. 2006; Britten 2006; Cordaux and Batzer 2006; Cordaux et al. 2006; Zhang and Chasin 2006; Ni et al. 2007), including noncoding RNAs (Kuryshev et al. 2001; Hasler and Strub 2006b).

AR sequences contain gene promoters (Ferrigno et al. 2001), which may be tissue-specific (Matlik et al. 2006; Romanish et al. 2007), transcription factor binding sites (Zhou et al. 2002), enhancers (Bejerano et al. 2006), silencers, polyadenylation signals, and other regulatory elements (Temin 1982; Hardman 1986), both sense and antisense (Matlik et al. 2006), which can become inserted into intergenic, intronic, protein-coding, and UTR regions (Landry et al. 2001; Smalheiser and Torvik 2006) of the genome and subsequently alter host gene expression and tissue specificity, and so the potential for exaptation of regulatory function is widespread around the genome (Smit 1999; Jordan et al. 2003; Shankar et al. 2004; Grover et al. 2005; Cordaux and Batzer 2006; Hasler and Strub 2006a; Polak and Domany 2006; Thornburg et al. 2006). This is not to say that the transposable elements themselves are under selection, but that sequences descended from them are (Silva et al. 2003; Lowe et al. 2007). There are RNAs derived from TEs that are developmentally modulated (Davidson and Posakony 1982), small RNAs from brain showing different strand biases (Berezikov et al. 2006a), and RNAs that undergo A-to-I editing (notably in Alus) and may have important regulatory consequences (Athanasiadis et al. 2004; Blow et al. 2004; Kim et al. 2004; Levanon et al. 2004; Hasler and Strub 2006a).

Transposon-derived sequences may also underlie the creation of regulatory networks, an idea that dates back many years (Britten and Davidson 1969; Davidson and Britten 1979) and that has modern support (Zhou et al. 2002; Peaston et al. 2004; Cordaux et al. 2006; Johnson et al. 2006). Indeed, Barbara McClintock originally discovered transposable elements by studying “controlling elements” (McClintock 1956). Changes in the patterns of histone methylation in TEs in different mammalian cell types and lineages have been known for many years (Breznik et al. 1984; Nishioka 1988; Mietz and Kuff 1990; Chalitchagorn et al. 2004; Khodosevich et al. 2004; Martens et al. 2005), and they may contribute to epigenetic gene regulation (Lippman et al. 2004; Zuckerkandl and Cavalli 2007). TEs are a significant source of innovation of microRNAs (miRNAs)—at least 47 out of 545 human miRNAs are annotated as TEs (our updated analysis of Smalheiser and Torvik 2005). This suggests another mechanism for generating novel regulatory networks; any TE-derived sequence that is processed into a miRNA may be complementary to, and be able to regulate the expression of, a large number of 3′ UTRs containing similar TE-derived sequences (Smalheiser and Torvik 2006). Thus, while transposons may be mostly parasitic and TE-derived sequences may appear to have remained inert, they have contributed to the evolution of mammalian genomes through many mechanisms that create and modify gene expression and regulatory networks.

Previous Section Next Section

Different rates of evolution of functional sequences

It is also clear that there are widely different rates of evolution of different types of functional sequences in mammals. Rapidly changing sequences may be interpreted as neutrally evolving and nonfunctional, as functionally important but having flexible structure–function relationships, or as functionally important and undergoing adaptive improvements by acquiring advantageous mutations (Zuckerkandl 1992). Innovation in protein-coding sequences, which are usually governed by quite strict analog structure–function constraints, appears to be rare, whereas ∼20% of eutherian conserved non-protein-coding elements (CNEs) are recent innovations that postdate the divergence of eutheria and metatheria (Mikkelsen et al. 2007).

Innovation and rapid evolution is also evident in thousands of gene regulatory sequences, which cover extended genomic regions and exhibit rapid turnover (Smith et al. 2004; Fisher et al. 2006; Frith et al. 2006; Taylor et al. 2006). This includes the remarkable functional conservation of regulatory sequences controlling ret gene expression in zebrafish and humans, although there is little recognizable primary sequence conservation (Fisher et al. 2006), and the independent exaptation of ARs as regulators of orthologous genes in human and rodents (Romanish et al. 2007). Taking turnover into account, it has been estimated that the extent of functional sequences in the human genome may be twice as great as that estimated from sequence conservation alone (Smith et al. 2004). Highly conserved epigenetic modifications can be used to identify tens of thousands of important regulatory elements, which cannot be identified by sequence conservation alone, half of which are lineage-specific (Roh et al. 2007). There are ∼1000 regions of the human genome over 10 kb long that do not tolerate transposable element insertions, even though primary sequence is not highly conserved (Simons et al. 2006). Gene deserts are large regions covering >700 Mb of the human genome, which appear to harbor distant regulatory elements and are devoid of protein-coding genes and that contain rapidly evolving regions that apparently accept neutral substitutions at a higher rate than the bulk of the genome yet resist chromosomal rearrangements, suggesting they are subject to evolutionary constraints, which are not readily apparent in primary sequence, against harboring genes (Ovcharenko et al. 2005). There are other regions of the genome that show evolutionary constraint that is not evident at the primary sequence level, including shuffled cis-regulatory elements (Sanges et al. 2006), regions subject to heterogeneous selection, which are evolving rapidly in primary sequence but slowly with respect to indels (Lunter et al. 2006), the distances between ultra-conserved elements (Sun et al. 2006), and regions predicted to contain common RNA secondary structure (Washietl et al. 2005) or highly constrained RNA tertiary structures that may have weak constraints on primary sequence or cryptic patterns of non-Watson–Crick base pair conservation (Lescoute et al. 2005).

Different rates of evolution also occur both within and between different classes of functional gene products, both RNAs and proteins. While the majority of protein-coding sequences are highly constrained, some are much more flexible, or under positive selection (Bustamante et al. 2005). As Kimura (1968) originally pointed out, many substitutions in protein-coding sequences appear to be neutral or nearly neutral, but this does not mean that the segments in which they reside are nonfunctional, simply that they are relatively plastic. In addition, Zuckerkandl (1992) notes that Kimura’s selectively neutral mutations are selectively equivalent and thus do not preclude them being functional. The first few hundred miRNAs to be discovered are highly conserved (Pang et al. 2006), but hundreds of more recently discovered miRNAs are not, being lineage- or even species-specific (Berezikov et al. 2006a, b; Piriyapongsa and Jordan 2007; Zhang et al. 2007) and expanding in the mammalian lineage (Hertel et al. 2006). There are also thousands of recently discovered small RNAs (piRNAs) expressed in testis that are not conserved between mouse and other species (Aravin et al. 2006; Girard et al. 2006; Lau et al. 2006).

As mentioned above, hundreds of longer ncRNAs, including the Xist and Tsix transcripts involved in X chromosome dosage compensation, are evolving quickly (Nesterova et al. 2001; Pang et al. 2006). A recent study of 3122 mouse long ncRNAs with weak evidence for purifying selection on their primary sequences nonetheless showed clear evidence for selection when their promoters, indel distribution, and conserved splice sites were considered (Ponjavic et al. 2007). There is also evidence of recent positive selection of ncRNAs in human, such as the HAR1 transcript expressed in particular regions of the brain (Pollard et al. 2006). Although functionally validated RNAs do not presently add up to a large fraction of the genome, they do (1) illustrate the point that low conservation of the primary sequence does not necessarily equate to or demonstrate lack or loss of function (Zuckerkandl 1992; Smith et al. 2004; Xing and Lee 2005; Pang et al. 2006) and (2) point to the possibility that many functional transcripts, particularly regulatory ncRNAs, may not be highly conserved over significant evolutionary distances, presumably because of more relaxed structure–function constraints and/or positive selection for regulatory variants associated with phenotypic radiation and adaptive evolution.

Consistent with this, the recent analysis of the ENCODE regions concluded that “many functional elements are seemingly unconstrained across mammalian evolution” (The ENCODE Project Consortium 2007). This has been interpreted to indicate that there may be many sequences that are “biologically active but provide no specific benefit to the organism” (The ENCODE Project Consortium 2007). However, this apparent contradiction can be readily resolved if the actual neutral rate of evolution is higher than current estimations. These observations are also consistent with the possibility that many of these apparently weakly constrained sequences encode lineage-specific functional elements and/or functionally similar but nonorthologous elements that have been subject to rapid drift. The problems with detecting which sequences, and in determining the extent of sequences, in the genome that may be under evolutionary constraint, particularly in regions that are not highly conserved, is exemplified by Figure 1, which shows a close-up view of a region within an intron of the ST7 gene in the ENCODE CFTR region and illustrates several difficulties in identifying selective constraints from regions that are not highly conserved.

View larger version:

Download as PowerPoint Slide

Figure 1.

Conservation in the ENCODE CFTR locus. The diagram shows a 600-bp region in an intron of the ST7 gene (hg17 chr7:116372751–116373350). The top panel (“Vertebrate Multiz Alignment & Conservation”) shows phastCons conservation scores based on 17-way alignments (Siepel et al. 2005). In black below this are alignments of human with chimp, rhesus, mouse, rat, rabbit, dog, cow, armadillo, and elephant. “Repeating Elements by RepeatMasker” shows an ancient repeat annotated as a MIR, which is 27% divergent from the MIR consensus, near the limit of detection. “MSA Consensus Constrained Elements” shows eight regions predicted to be conserved by at least one algorithm (“Loose” set), two regions predicted to be conserved by at least two algorithms in at least two alignments (“Moderate” set), and no regions predicted to be conserved by all algorithms in all alignments (“Strict” set). “TBA phastCons Conservation,” “TBA GERP Conservation,” and “TBA SCONE Conservation” show conservation scores over the TBA alignment from phastCons, GERP, and SCONE algorithms, respectively. “TBA Conserved Elements,” “MLAGAN Conserved Elements,” and “MAVID Conserved Elements” show elements predicted conserved based on the scores from the phastCons, BinCons, GERP, and SCONE algorithms across alignments from TBA, MLAGAN, and MAVID, respectively (Margulies et al. 2007) (image from http://genome.ucsc.edu/). The figure illustrates several difficulties in identifying selective constraints from regions that are not highly conserved: (1) conserved blocks are predicted within ARs assumed to evolve neutrally; (2) conservation scores vary depending on the species aligned (phastCons scores in the top panel are different from scores in TBA phastCons scores); (3) patterns of identified conservation vary between algorithms over the same alignment (compare the pattern of TBA scores from phastCons, GERP, and SCONE); and (4) conserved element predictions based on these scores vary between different algorithms on the same alignment as well as between the same algorithm over different alignments (compare phastCons, BinCons, and GERP elements over TBA, MLAGAN, and MAVID alignments).

A common objection to the possibility that mammalian genomes may contain large amounts of functional sequence under weak selection is the prediction that only strongly advantageous or disadvantageous alleles are subject to selection in mammals due to their small effective population sizes, and thus alleles that have a small functional impact evolve neutrally. This objection is apparently contradicted by the “unexpected strength of natural selection” in synonymous sites discussed in Chamary et al. (2006). In addition, Zuckerkandl (1992) points out that functionality in the more rapidly evolving noncoding regions of the genome cannot be negated on the basis of other observations that support both neutralist and alternative interpretations.

Previous Section Next Section

How much of the genome might be functional?

The assumption that recognizable ARs are nonfunctional and are representative leads to the conservative estimate that 3%–8% of genomic regions are under purifying selection in mammals. However, it is clear that all estimates of the extent of neutrally evolving segments of the human genome, and reciprocally of those under selection and imputed to be functional, are entirely dependent both qualitatively and quantitatively on the assumption of the neutral evolution of extant ARs, which may or may not be correct, but which is at least subject to doubt. Evidence continues to mount that AR-derived sequences can modify genetic output, and that both individual ARs and classes of ARs are evolving non-neutrally. There may also be significant under-representation of faster-evolving unrecognized or unaligned ARs, with the consequence that the extent of purifying selection in mammals, and hence the proportion of functional sequences, may be significantly underestimated. Moreover, there are significant discrepancies and difficulties in estimating the presumed neutral rate (Margulies et al. 2007), all of which are dependent on the underlying assumptions and parameters and which may be interpreted differently. Unfortunately, however, the available data in large part do not permit distinction between, nor assessment of the extent of, sequences that may be inert and evolving without constraint versus those that are functional and evolving at different rates under different structure–function constraints and different selection pressures, with different evolutionary histories, especially those involved in gene regulation. It therefore remains an open question whether the majority of the genome is evolving neutrally and whether it may be functional or not. A recent study has shown that a substantial fraction of purifying selection in human noncoding sequences occurs outside of previously identified conserved noncoding sequences and is diffusely distributed across the genome. This finding suggests that there are many human noncoding variants that may impact gene expression and phenotypic traits, most of which will have escaped detection with current approaches to genome analysis (Asthana et al. 2007).

It seems clear that 5% is a minimum estimate of the fraction of the human genome that is functional, and that the true extent is likely to be significantly greater. If the upper figure of 11.8% under common purifying selection in mammals from ENCODE (Margulies et al. 2007) is realistic across the genome as a whole, and if turnover and positive selection approximately doubles this figure (Smith et al. 2004), then the functional portion of the genome may exceed 20%. It is also now clear that the majority of the mammalian genome is expressed and that many mammalian genes are accompanied by extensive regulatory regions. Thus, although admittedly on the basis of as yet limited evidence, it is quite plausible that many, if not the majority, of the expressed transcripts are functional and that a major component of genomic information is rapidly evolving regulatory DNA and RNA. Consequently, it is possible that much if not most of the human genome may be functional. This possibility cannot be ruled out on the available evidence, either from conservation analysis or from genetic studies (Mattick and Makunin 2006), but does challenge current conceptions of the extent of functionality of the human genome and the nature of the genetic programming of humans and other complex organisms.

Previous Section Next Section

Acknowledgments

We thank Cas Simons, Igor Makunin, Evgeny Glazov, and Chris Ponting for their advice and comments on the manuscript. We also thank the reviewers and the editor for constructive criticisms and helpful suggestions. We acknowledge the financial support of the Australian Research Council, the Queensland State Government, and the University of Queensland.

Previous Section Next Section

Footnotes

↵1 Corresponding author.

↵1 E-mail j.mattick{at}imb.uq.edu.au; fax 61-7-3346-2111.
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6406307
- Received February 17, 2007.
- Accepted July 12, 2007.
Copyright © 2007, Cold Spring Harbor Laboratory Press

Previous Section

References

↵
1. Andersen, A.A.,
2. Panning, B.
(2003) Epigenetic gene regulation by noncoding RNAs. Curr. Opin. Cell Biol. 15:281–289.
CrossRef Medline Google Scholar
↵
1. Aravin, A.,
2. Gaidatzis, D.,
3. Pfeffer, S.,
4. Lagos-Quintana, M.,
5. Landgraf, P.,
6. Iovino, N.,
7. Morris, P.,
8. Brownstein, M.J.,
9. Kuramochi-Miyagawa, S.,
10. Nakano, T.,
11. et al.
(2006) A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442:203–207.
Medline Google Scholar
1. Asthana, S.,
2. Noble, W.S.,
3. Kryukov, G.,
4. Grant, C.E.,
5. Sunyaev, S.,
6. Stamatoyannopoulos, J.A.
(2007) Widely distributed noncoding purifying selection in the human genome. Proc. Natl. Acad. Sci. 104:12410–12415.
Abstract/FREE Full Text
↵
1. Athanasiadis, A.,
2. Rich, A.,
3. Maas, S.
(2004) Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol. 2:e391, doi:10.1371/journal.pbio.0020391.
CrossRef Medline Google Scholar
↵
1. Baltimore, D.
(1985) Retroviruses and retrotransposons: The role of reverse transcription in shaping the eukaryotic genome. Cell 40:481–482.
CrossRef Medline Google Scholar
↵
1. Bartel, D.
(2004) MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116:281–297.
CrossRef Medline Google Scholar
↵
1. Baskerville, S.,
2. Bartel, D.P.
(2005) Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11:241–247.
Abstract/FREE Full Text
↵
1. Bejerano, G.,
2. Lowe, C.B.,
3. Ahituv, N.,
4. King, B.,
5. Siepel, A.,
6. Salama, S.R.,
7. Rubin, E.M.,
8. Kent, W.J.,
9. Haussler, D.
(2006) A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 441:87–90.
CrossRef Medline Google Scholar
↵
1. Berezikov, E.,
2. Thuemmler, F.,
3. van Laake, L.,
4. Kondova, I.,
5. Bontrop, R.,
6. Cuppen, E.,
7. Plasterk, R.H.A.
(2006a) Diversity of microRNAs in human and chimpanzee brain. Nat. Genet. 38:1375–1377.
CrossRef Medline Google Scholar
↵
1. Berezikov, E.,
2. van Tetering, G.,
3. Verheul, M.,
4. de van Belt, J.,
5. van Laake, L.,
6. Vos, J.,
7. Verloop, R.,
8. de van Wetering, M.,
9. Guryev, V.,
10. Takada, S.,
11. et al.
(2006b) Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res. 16:1289–1298.
Abstract/FREE Full Text
↵
1. Bernstein, E.,
2. Allis, C.D.
(2005) RNA meets chromatin. Genes & Dev. 19:1635–1655.
Abstract/FREE Full Text
↵
1. Blow, M.,
2. Futreal, P.A.,
3. Wooster, R.,
4. Stratton, M.R.
(2004) A survey of RNA editing in human brain. Genome Res. 14:2379–2387.
Abstract/FREE Full Text
↵
1. Bollenbach, T.,
2. Vetsigian, K.,
3. Kishony, R.
(2007) Evolution and multilevel optimization of the genetic code. Genome Res. 17:401–404.
Abstract/FREE Full Text
↵
1. Brandt, J.,
2. Schrauth, S.,
3. Veith, A.M.,
4. Froschauer, A.,
5. Haneke, T.,
6. Schultheis, C.,
7. Gessler, M.,
8. Leimeister, C.,
9. Volff, J.N.
(2005a) Transposable elements as a source of genetic innovation: Expression and evolution of a family of retrotransposon-derived neogenes in mammals. Gene 345:101–111.
CrossRef Medline Google Scholar
↵
1. Brandt, J.,
2. Veith, A.M.,
3. Volff, J.N.
(2005b) A family of neofunctionalized Ty3/gypsy retrotransposon genes in mammalian genomes. Cytogenet. Genome Res. 110:307–317.
CrossRef Medline Google Scholar
↵
1. Breznik, T.,
2. Traina-Dorge, V.,
3. Gama-Sosa, M.,
4. Gehrke, C.W.,
5. Ehrlich, M.,
6. Medina, D.,
7. Butel, J.S.,
8. Cohen, J.C.
(1984) Mouse mammary tumor virus DNA methylation: Tissue-specific variation. Virology 136:69–77.
CrossRef Medline Google Scholar
↵
1. Britten, R.
(2006) Transposable elements have contributed to thousands of human proteins. Proc. Natl. Acad. Sci. 103:1798–1803.
Abstract/FREE Full Text
↵
1. Britten, R.J.,
2. Davidson, E.H.
(1969) Gene regulation for higher cells: A theory. Science 165:349–357.
FREE Full Text
↵
1. Brosius, J.
(1999) RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238:115–134.
CrossRef Medline Google Scholar
↵
1. Brosius, J.
(2005) Waste not, want not—Transcript excess in multicellular eukaryotes. Trends Genet. 21:287–288.
CrossRef Medline Google Scholar
↵
1. Bustamante, C.D.,
2. Fledel-Alon, A.,
3. Williamson, S.,
4. Nielsen, R.,
5. Hubisz, M.T.,
6. Glanowski, S.,
7. Tanenbaum, D.M.,
8. White, T.J.,
9. Sninsky, J.J.,
10. Hernandez, R.D.,
11. et al.
(2005) Natural selection on protein-coding genes in the human genome. Nature 437:1153–1157.
CrossRef Medline Google Scholar
↵
1. Carninci, P.
(2007) Constructing the landscape of the mammalian transcriptome. J. Exp. Biol. 210:1497–1506.
Abstract/FREE Full Text
↵
1. Carninci, P.,
2. Kasukawa, T.,
3. Katayama, S.,
4. Gough, J.,
5. Frith, M.C.,
6. Maeda, N.,
7. Oyama, R.,
8. Ravasi, T.,
9. Lenhard, B.,
10. Wells, C.,
11. et al.
(2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563.
Abstract/FREE Full Text
↵
1. Chalitchagorn, K.,
2. Shuangshoti, S.,
3. Hourpai, N.,
4. Kongruttanachok, N.,
5. Tangkijvanich, P.,
6. Thong-ngam, D.,
7. Voravud, N.,
8. Sriuranpong, V.,
9. Mutirangura, A.
(2004) Distinctive pattern of LINE-1 methylation level in normal tissues and the association with carcinogenesis. Oncogene 23:8841–8846.
CrossRef Medline Google Scholar
↵
1. Chamary, J.V.,
2. Parmley, J.L.,
3. Hurst, L.D.
(2006) Hearing silence: Non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet. 7:98–108.
CrossRef Medline Google Scholar
↵
1. Cheng, J.,
2. Kapranov, P.,
3. Drenkow, J.,
4. Dike, S.,
5. Brubaker, S.,
6. Patel, S.,
7. Long, J.,
8. Stern, D.,
9. Tammana, H.,
10. Helt, G.,
11. et al.
(2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308:1149–1154.
Abstract/FREE Full Text
↵
1. Chiaromonte, F.,
2. Weber, R.J.,
3. Roskin, K.M.,
4. Diekhans, M.,
5. Kent, W.J.,
6. Haussler, D.
(2003) The share of human genomic DNA under selection estimated from human–mouse genomic alignments. Cold Spring Harb. Symp. Quant. Biol. 68:245–254.
CrossRef Medline Google Scholar
↵
1. Cooper, G.M.,
2. Brudno, M.,
3. Green, E.D.,
4. Batzoglou, S.,
5. Sidow, A.,
6. NISC Comparative Sequencing Program.
(2003) Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res. 13:813–820.
Abstract/FREE Full Text
↵
1. Cooper, G.M.,
2. Brudno, M.,
3. Stone, E.A.,
4. Dubchak, I.,
5. Batzoglou, S.,
6. Sidow, A.
(2004) Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 14:539–548.
Abstract/FREE Full Text
↵
1. Cooper, G.M.,
2. Stone, E.A.,
3. Asimenos, G.,
4. Green, E.D.,
5. Batzoglou, S.,
6. Sidow, A.
(2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15:901–913.
Abstract/FREE Full Text
↵
1. Cordaux, R.,
2. Batzer, M.A.
(2006) Teaching an old dog new tricks: SINEs of canine genomic diversity. Proc. Natl. Acad. Sci. 103:1157–1158.
FREE Full Text
1. Cordaux, R.,
2. Udit, S.,
3. Batzer, M.A.,
4. Feschotte, C.
(2006) Birth of a chimeric primate gene by capture of the transposase gene from a mobile element. Proc. Natl. Acad. Sci. 103:8101–8106.
Abstract/FREE Full Text
↵
1. Dagan, T.,
2. Sorek, R.,
3. Sharon, E.,
4. Ast, G.,
5. Graur, D.
(2004) AluGene: A database of Alu elements incorporated within protein-coding genes. Nucleic Acids Res. 32:D489–D492, doi:10.1093/nar/gkh132.
Abstract/FREE Full Text
↵
1. Davidson, E.H.,
2. Britten, R.J.
(1979) Regulation of gene expression: Possible role of repetitive sequences. Science 204:1052–1059.
Abstract/FREE Full Text
↵
1. Davidson, E.H.,
2. Posakony, J.W.
(1982) Repetitive sequence transcripts in development. Nature 297:633–635.
CrossRef Medline Google Scholar
↵
1. Eisen, J.A.,
2. Coyne, R.S.,
3. Wu, M.,
4. Wu, D.,
5. Thiagarajan, M.,
6. Wortman, J.R.,
7. Badger, J.H.,
8. Ren, Q.,
9. Amedeo, P.,
10. Jones, K.M.,
11. et al.
(2006) Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. PLoS Biol. 4:e286, doi:10.1371/journal.pbio.0040286.
CrossRef Medline Google Scholar
↵
1. The ENCODE Project Consortium
(2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799–816.
CrossRef Medline Google Scholar
↵
1. Ferrigno, O.,
2. Virolle, T.,
3. Djabari, Z.,
4. Ortonne, J.P.,
5. White, R.J.,
6. Aberdam, D.
(2001) Transposable B2 SINE elements can provide mobile RNA polymerase II promoters. Nat. Genet. 28:77–81.
CrossRef Medline Google Scholar
↵
1. Finnegan, D.J.
(1989) Eukaryotic transposable elements and genome evolution. Trends Genet. 5:103–107.
CrossRef Medline Google Scholar
↵
1. Fisher, S.,
2. Grice, E.A.,
3. Vinton, R.M.,
4. Bessling, S.L.,
5. McCallion, A.S.
(2006) Conservation of RET regulatory function from human to zebrafish without sequence similarity. Science 312:276–279.
Abstract/FREE Full Text
↵
1. Frazer, K.A.,
2. Tao, H.,
3. Osoegawa, K.,
4. de Jong, P.J.,
5. Chen, X.,
6. Doherty, M.F.,
7. Cox, D.R.
(2004) Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14:367–372.
Abstract/FREE Full Text
↵
1. Frith, M.C.,
2. Pheasant, M.,
3. Mattick, J.S.
(2005) The amazing complexity of the human transcriptome. Eur. J. Hum. Genet. 13:894–897.
CrossRef Medline Google Scholar
↵
1. Frith, M.C.,
2. Ponjavic, J.,
3. Fredman, D.,
4. Kai, C.,
5. Kawai, J.,
6. Carninci, P.,
7. Hayshizaki, Y.,
8. Sandelin, A.
(2006) Evolutionary turnover of mammalian transcription start sites. Genome Res. 16:713–722.
Abstract/FREE Full Text
↵
1. Gaffney, D.J.,
2. Keightley, P.D.
(2006) Genomic selective constraints in murid noncoding DNA. PLoS Genet. 2:e204, doi:10.1371/journal.pgen.0020204.
CrossRef Medline Google Scholar
↵
1. Ganapathi, M.,
2. Srivastava, P.,
3. Das Sutar, S.K.,
4. Kumar, K.,
5. Dasgupta, D.,
6. Pal Singh, G.,
7. Brahmachari, V.,
8. Brahmachari, S.K.
(2005) Comparative analysis of chromatin landscape in regulatory regions of human housekeeping and tissue specific genes. BMC Bioinformatics 6:126, doi:10.1186/1471-2105-6-126.
CrossRef Medline Google Scholar
↵
1. Gerstein, M.B.,
2. Bruce, C.,
3. Rozowsky, J.S.,
4. Zheng, D.,
5. Du, J.,
6. Korbel, J.O.,
7. Emanuelsson, O.,
8. Zhang, Z.D.,
9. Weissman, S.,
10. Snyder, M.
(2007) What is a gene, post-ENCODE? History and updated definition. Genome Res. 17:669–681.
Abstract/FREE Full Text
↵
1. Ginger, M.R.,
2. Shore, A.N.,
3. Contreras, A.,
4. Rijnkels, M.,
5. Miller, J.,
6. Gonzalez-Rimbau, M.F.,
7. Rosen, J.M.
(2006) A noncoding RNA is a potential marker of cell fate during mammary gland development. Proc. Natl. Acad. Sci. 103:5781–5786.
Abstract/FREE Full Text
↵
1. Gingeras, T.R.
(2007) Origin of phenotypes: Genes and transcripts. Genome Res. 17:682–690.
Abstract/FREE Full Text
↵
1. Girard, A.,
2. Sachidanandam, R.,
3. Hannon, G.J.,
4. Carmell, M.A.
(2006) A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature 442:199–202.
CrossRef Medline Google Scholar
↵
1. Goodrich, J.A.,
2. Kugel, J.F.
(2006) Non-coding-RNA regulators of RNA polymerase II transcription. Nat. Rev. Mol. Cell Biol. 7:612–616.
CrossRef Medline Google Scholar
↵
1. Goodstadt, L.,
2. Ponting, C.P.
(2006) Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comp. Biol. 2:e133, doi:10.1371/journal.pcbi.0020133.
CrossRef Medline Google Scholar
↵
1. Grover, D.,
2. Kannan, K.,
3. Brahmachari, S.K.,
4. Mukerji, M.
(2005) ALU-ring elements in the primate genomes. Genetica 124:273–289.
CrossRef Medline Google Scholar
↵
1. Hardman, N.
(1986) Structure and function of repetitive DNA in eukaryotes. Biochem. J. 234:1–11.
FREE Full Text
↵
1. Hasler, J.,
2. Strub, K.
(2006a) Alu elements as regulators of gene expression. Nucleic Acids Res. 34:5491–5497, doi:10.1093/nar/gkl706.
Abstract/FREE Full Text
↵
1. Hasler, J.,
2. Strub, K.
(2006b) Alu RNP and Alu RNA regulate translation initiation in vitro. Nucleic Acids Res. 34:2374–2385, doi:10.1093/nar/gkl246.
Abstract/FREE Full Text
↵
1. Hellmann, I.,
2. Zollner, S.,
3. Enard, W.,
4. Ebersberger, I.,
5. Nickel, B.,
6. Paabo, S.
(2003) Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 13:831–837.
Abstract/FREE Full Text
↵
1. Hertel, J.,
2. Lindemeyer, M.,
3. Missal, K.,
4. Fried, C.,
5. Tanzer, A.,
6. Flamm, C.,
7. Hofacker, I.L.,
8. Stadler, P.F.
(2006) The expansion of the metazoan microRNA repertoire. BMC Genomics 7:25, doi:10.1186/1471-2164-7-25.
CrossRef Medline Google Scholar
↵
1. International Human Genome Sequencing Consortium
(2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945.
CrossRef Medline Google Scholar
↵
1. Ishii, N.,
2. Ozaki, K.,
3. Sato, H.,
4. Mizuno, H.,
5. Saito, S.,
6. Takahashi, A.,
7. Miyamoto, Y.,
8. Ikegawa, S.,
9. Kamatani, N.,
10. Hori, M.,
11. et al.
(2006) Identification of a novel non-coding RNA, MIAT, that confers risk of myocardial infarction. J. Hum. Genet. 51:1087–1099.
CrossRef Medline Google Scholar
↵
1. Itzkovitz, S.,
2. Alon, U.
(2007) The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Res. 17:405–412.
Abstract/FREE Full Text
↵
1. Janowski, B.A.,
2. Huffman, K.E.,
3. Schwartz, J.C.,
4. Ram, R.,
5. Hardy, D.,
6. Shames, D.S.,
7. Minna, J.D.,
8. Corey, D.R.
(2005) Inhibiting gene expression at transcription start sites in chromosomal DNA with antigene RNAs. Nat. Chem. Biol. 1:216–222.
CrossRef Medline Google Scholar
↵
1. Jelinek, W.R.,
2. Toomey, T.P.,
3. Leinwand, L.,
4. Duncan, C.H.,
5. Biro, P.A.,
6. Choudary, P.V.,
7. Weissman, S.M.,
8. Rubin, C.M.,
9. Houck, C.M.,
10. Deininger, P.L.,
11. et al.
(1980) Ubiquitous, interspersed repeated sequences in mammalian genomes. Proc. Natl. Acad. Sci. 77:1398–1402.
Abstract/FREE Full Text
↵
1. Johnson, R.,
2. Gamblin, R.J.,
3. Ooi, L.,
4. Bruce, A.W.,
5. Donaldson, I.J.,
6. Westhead, D.R.,
7. Wood, I.C.,
8. Jackson, R.M.,
9. Buckley, N.J.
(2006) Identification of the REST regulon reveals extensive transposable element-mediated binding site duplication. Nucleic Acids Res. 34:3862–3877, doi:10.1093/nar/gkl525.
Abstract/FREE Full Text
↵
1. Jordan, I.K.,
2. Rogozin, I.B.,
3. Glazko, G.V.,
4. Koonin, E.V.
(2003) Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 19:68–72.
CrossRef Medline Google Scholar
↵
1. Kamal, M.,
2. Xie, X.,
3. Lander, E.S.
(2006) A large family of ancient repeat elements in the human genome is under strong selection. Proc. Natl. Acad. Sci. 103:2740–2745.
Abstract/FREE Full Text
↵
1. Katayama, S.,
2. Tomaru, Y.,
3. Kasukawa, T.,
4. Waki, K.,
5. Nakanishi, M.,
6. Nakamura, M.,
7. Nishida, H.,
8. Yap, C.C.,
9. Suzuki, M.,
10. Kawai, J.,
11. et al.
(2005) Antisense transcription in the mammalian transcriptome. Science 309:1564–1566.
Abstract/FREE Full Text
↵
1. Khodosevich, K.,
2. Lebedev, Y.,
3. Sverdlov, E.D.
(2004) Large-scale determination of the methylation status of retrotransposons in different tissues using a methylation tags approach. Nucleic Acids Res. 32:e31, doi:10.1093/nar/gnh035.
Abstract/FREE Full Text
↵
1. Kim, D.D.,
2. Kim, T.T.,
3. Walsh, T.,
4. Kobayashi, Y.,
5. Matise, T.C.,
6. Buyske, S.,
7. Gabriel, A.
(2004) Widespread RNA editing of embedded Alu elements in the human transcriptome. Genome Res. 14:1719–1725.
Abstract/FREE Full Text
↵
1. Kim, D.H.,
2. Villeneuve, L.M.,
3. Morris, K.V.,
4. Rossi, J.J.
(2006) Argonaute-1 directs siRNA-mediated transcriptional gene silencing in human cells. Nat. Struct. Mol. Biol. 13:793–797.
CrossRef Medline Google Scholar
↵
1. Kimchi-Sarfaty, C.,
2. Oh, J.M.,
3. Kim, I.,
4. Sauna, Z.E.,
5. Calcagno, A.M.,
6. Ambudkar, S.V.,
7. Gottesman, M.M.
(2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315:525–528.
Abstract/FREE Full Text
↵
1. Kimura, M.
(1968) Evolutionary rate at the molecular level. Nature 217:624–626.
CrossRef Medline Google Scholar
↵
1. Komar, A.A.
(2007) SNPs, silent but not invisible. Science 315:466–467.
Abstract/FREE Full Text
↵
1. Krull, M.,
2. Brosius, J.,
3. Schmitz, J.
(2005) Alu-SINE exonization: En route to protein-coding function. Mol. Biol. Evol. 22:1702–1711.
Abstract/FREE Full Text
↵
1. Kuryshev, V.Y.,
2. Skryabin, B.V.,
3. Kremerskothen, J.,
4. Jurka, J.,
5. Brosius, J.
(2001) Birth of a gene: Locus of neuronal BC200 snmRNA in three prosimians and human BC200 pseudogenes as archives of change in the Anthropoidea lineage. J. Mol. Biol. 309:1049–1066.
CrossRef Medline Google Scholar
↵
1. Landry, J.R.,
2. Medstrand, P.,
3. Mager, D.L.
(2001) Repetitive elements in the 5′ untranslated region of a human zinc-finger gene modulate transcription and translation efficiency. Genomics 76:110–116.
CrossRef Medline Google Scholar
↵
1. Lau, N.C.,
2. Seto, A.G.,
3. Kim, J.,
4. Kuramochi-Miyagawa, S.,
5. Nakano, T.,
6. Bartel, D.P.,
7. Kingston, R.E.
(2006) Characterization of the piRNA complex from rat testes. Science 313:363–367.
Abstract/FREE Full Text
↵
1. Lescoute, A.,
2. Leontis, N.B.,
3. Massire, C.,
4. Westhof, E.
(2005) Recurrent structural RNA motifs, isostericity matrices and sequence alignments. Nucleic Acids Res. 33:2395–2409, doi:10.1093/nar/gki535.
Abstract/FREE Full Text
↵
1. Levanon, E.Y.,
2. Eisenberg, E.,
3. Yelin, R.,
4. Nemzer, S.,
5. Hallegger, M.,
6. Shemesh, R.,
7. Fligelman, Z.Y.,
8. Shoshan, A.,
9. Pollock, S.R.,
10. Sztybel, D.,
11. et al.
(2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22:1001–1005.
CrossRef Medline Google Scholar
↵
1. Lev-Maor, G.,
2. Sorek, R.,
3. Shomron, N.,
4. Ast, G.
(2003) The birth of an alternatively spliced exon: 3′ Splice-site selection in Alu exons. Science 300:1288–1291.
Abstract/FREE Full Text
↵
1. Li, L.C.,
2. Okino, S.T.,
3. Zhao, H.,
4. Pookot, D.,
5. Place, R.F.,
6. Urakami, S.,
7. Enokida, H.,
8. Dahiya, R.
(2006) Small dsRNAs induce transcriptional activation in human cells. Proc. Natl. Acad. Sci. 103:17337–17342.
Abstract/FREE Full Text
↵
1. Lindblad-Toh, K.,
2. Wade, C.M.,
3. Mikkelsen, T.S.,
4. Karlsson, E.K.,
5. Jaffe, D.B.,
6. Kamal, M.,
7. Clamp, M.,
8. Chang, J.L.,
9. Kulbokas, E.J.,
10. Zody, M.C.,
11. et al.
(2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438:803–819.
CrossRef Medline Google Scholar
↵
1. Lippman, Z.,
2. Gendrel, A.V.,
3. Black, M.,
4. Vaughn, M.W.,
5. Dedhia, N.,
6. McCombie, W.R.,
7. Lavine, K.,
8. Mittal, V.,
9. May, B.,
10. Kasschau, K.D.,
11. et al.
(2004) Role of transposable elements in heterochromatin and epigenetic control. Nature 430:471–476.
CrossRef Medline Google Scholar
↵
1. Lowe, C.B.,
2. Bejerano, G.,
3. Haussler, D.
(2007) Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc. Natl. Acad. Sci. 104:8005–8010.
Abstract/FREE Full Text
↵
1. Lunter, G.,
2. Ponting, C.P.,
3. Hein, J.
(2006) Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comp. Biol. 2:e5, doi:10.1371/journal.pcbi.0020005.
CrossRef Medline Google Scholar
↵
1. Margulies, E.H.,
2. Blanchette, M.,
3. Haussler, D.,
4. Green, E.D.,
5. NISC Comparative Sequencing Program
(2003) Identification and characterization of multi-species conserved sequences. Genome Res. 13:2507–2518.
Abstract/FREE Full Text
↵
1. Margulies, E.H.,
2. Cooper, G.M.,
3. Asimenos, G.,
4. Thomas, D.J.,
5. Dewey, C.N.,
6. Siepel, A.,
7. Birney, E.,
8. Keefe, D.,
9. Schwartz, A.S.,
10. Hou, M.,
11. et al.
(2007) Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17:760–774.
Abstract/FREE Full Text
↵
1. Martens, J.H.,
2. O’Sullivan, R.J.,
3. Braunschweig, U.,
4. Opravil, S.,
5. Radolf, M.,
6. Steinlein, P.,
7. Jenuwein, T.
(2005) The profile of repeat-associated histone lysine methylation states in the mouse epigenome. EMBO J. 24:800–812.
Abstract
↵
1. Martianov, I.,
2. Ramadass, A.,
3. Serra Barros, A.,
4. Chow, N.,
5. Akoulitchev, A.
(2007) Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript. Nature 445:666–670.
CrossRef Medline Google Scholar
↵
1. Matlik, K.,
2. Redik, K.,
3. Speek, M.
(2006) L1 antisense promoter drives tissue-specific transcription of human genes. J. Biomed. Biotechnol. 2006:71753.
Medline Google Scholar
↵
1. Mattick, J.S.
(2007) A new paradigm for developmental biology. J. Exp. Biol. 210:1526–1547.
Abstract/FREE Full Text
↵
1. Mattick, J.S.,
2. Makunin, I.V.
(2005) Small regulatory RNAs in mammals. Hum. Mol. Genet. 14:R121–R132.
Abstract/FREE Full Text
↵
1. Mattick, J.S.,
2. Makunin, I.V.
(2006) Non-coding RNA. Hum. Mol. Genet. 15:R17–R29.
Abstract/FREE Full Text
↵
1. McClintock, B.
(1956) Controlling elements and the gene. Cold Spring Harb. Symp. Quant. Biol. 21:197–216.
Medline Google Scholar
↵
1. Mietz, J.A.,
2. Kuff, E.L.
(1990) Tissue and strain-specific patterns of endogenous proviral hypomethylation analyzed by two-dimensional gel electrophoresis. Proc. Natl. Acad. Sci. 87:2269–2273.
Abstract/FREE Full Text
↵
1. Mikkelsen, T.S.,
2. Wakefield, M.J.,
3. Aken, B.,
4. Amemiya, C.T.,
5. Chang, J.L.,
6. Duke, S.,
7. Garber, M.,
8. Gentles, A.J.,
9. Goodstadt, L.,
10. Heger, A.,
11. et al.
(2007) Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447:167–177.
CrossRef Medline Google Scholar
↵
1. Nesterova, T.B.,
2. Slobodyanyuk, S.Y.,
3. Elisaphenko, E.A.,
4. Shevchenko, A.I.,
5. Johnston, C.,
6. Pavlova, M.E.,
7. Rogozin, I.B.,
8. Kolesnikov, N.N.,
9. Brockdorff, N.,
10. Zakian, S.M.
(2001) Characterization of the genomic Xist locus in rodents reveals conservation of overall gene structure and tandem repeats but rapid evolution of unique sequence. Genome Res. 11:833–849.
Abstract/FREE Full Text
↵
1. Ni, J.Z.,
2. Grate, L.,
3. Donohue, J.P.,
4. Preston, C.,
5. Nobida, N.,
6. O’Brien, G.,
7. Shiue, L.,
8. Clark, T.A.,
9. Blume, J.E.,
10. Ares, M.
(2007) Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-mediated decay. Genes & Dev. 21:708–718.
Abstract/FREE Full Text
↵
1. Nishioka, Y.
(1988) Tissue specific methylation of human Y chromosomal DNA sequences. Tissue Cell 20:875–880.
CrossRef Medline Google Scholar
↵
1. Oei, S.L.,
2. Babich, V.S.,
3. Kazakov, V.I.,
4. Usmanova, N.M.,
5. Kropotov, A.V.,
6. Tomilin, N.V.
(2004) Clusters of regulatory signals for RNA polymerase II transcription associated with Alu family repeats and CpG islands in human promoters. Genomics 83:873–882.
CrossRef Medline Google Scholar
↵
1. Ovcharenko, I.,
2. Loots, G.G.,
3. Nobrega, M.A.,
4. Hardison, R.C.,
5. Miller, W.,
6. Stubbs, L.
(2005) Evolution and functional classification of vertebrate gene deserts. Genome Res. 15:137–145.
Abstract/FREE Full Text
↵
1. Pace, J.K.,
2. Feschotte, C.
(2007) The evolutionary history of human DNA transposons: Evidence for intense activity in the primate lineage. Genome Res. 17:422–432.
Abstract/FREE Full Text
↵
1. Pagani, F.,
2. Raponi, M.,
3. Baralle, F.E.
(2005) Synonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution. Proc. Natl. Acad. Sci. 102:6368–6372.
Abstract/FREE Full Text
↵
1. Pagano, A.,
2. Castelnuovo, M.,
3. Tortelli, F.,
4. Ferrari, R.,
5. Dieci, G.,
6. Cancedda, R.
(2007) New small nuclear RNA gene-like transcriptional units as sources of regulatory transcripts. PLoS Genet. 3:e1, doi:10.1371/journal.pgen.0030001.
CrossRef Medline Google Scholar
↵
1. Pang, K.C.,
2. Frith, M.C.,
3. Mattick, J.S.
(2006) Rapid evolution of noncoding RNAs: Lack of conservation does not mean lack of function. Trends Genet. 22:1–5.
CrossRef Medline Google Scholar
↵
1. Peaston, A.E.,
2. Evsikov, A.V.,
3. Graber, J.H.,
4. de Vries, W.N.,
5. Holbrook, A.E.,
6. Solter, D.,
7. Knowles, B.B.
(2004) Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev. Cell 7:597–606.
CrossRef Medline Google Scholar
↵
1. Peters, B.A.,
2. St. Croix, B.,
3. Sjöblom, T.,
4. Cummins, J.M.,
5. Silliman, N.,
6. Ptak, J.,
7. Saha, S.,
8. Kinzler, K.W.,
9. Hatzis, C.,
10. Velculescu, V.E.
(2007) Large-scale identification of novel transcripts in the human genome. Genome Res. 17:287–292.
Abstract/FREE Full Text
↵
1. Piriyapongsa, J.,
2. Jordan, I.K.
(2007) A family of human microRNA genes from miniature inverted-repeat transposable elements. PLoS ONE 2:e203, doi:10.1371/journal.pone.0000203.
CrossRef Google Scholar
↵
1. Polak, P.,
2. Domany, E.
(2006) Alu elements contain many binding sites for transcription factors and may play a role in regulation of developmental processes. BMC Genomics 7:133, doi:10.1186/1471-2164-7-133.
CrossRef Medline Google Scholar
↵
1. Pollard, K.S.,
2. Salama, S.R.,
3. Lambert, N.,
4. Lambot, M.A.,
5. Coppens, S.,
6. Pedersen, J.S.,
7. Katzman, S.,
8. King, B.,
9. Onodera, C.,
10. Siepel, A.,
11. et al.
(2006) An RNA gene expressed during cortical development evolved rapidly in humans. Nature 443:167–172.
CrossRef Medline Google Scholar
↵
1. Ponjavic, J.,
2. Ponting, C.P.,
3. Lunter, G.
(2007) Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 17:556–565.
Abstract/FREE Full Text
↵
1. Prasanth, K.V.,
2. Prasanth, S.G.,
3. Xuan, Z.,
4. Hearn, S.,
5. Freier, S.M.,
6. Bennett, C.F.,
7. Zhang, M.Q.,
8. Spector, D.L.
(2005) Regulating gene expression through RNA nuclear retention. Cell 123:249–263.
CrossRef Medline Google Scholar
↵
1. Ravasi, T.,
2. Suzuki, H.,
3. Pang, K.C.,
4. Katayama, S.,
5. Furuno, M.,
6. Okunishi, R.,
7. Fukuda, S.,
8. Ru, K.,
9. Frith, M.C.,
10. Gongora, M.M.,
11. et al.
(2006) Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res. 16:11–19.
Abstract/FREE Full Text
↵
1. Rinn, J.L.,
2. Kertesz, M.,
3. Wang, J.K.,
4. Squazzo, S.L.,
5. Xu, X.,
6. Brugmann, S.A.,
7. Goodnough, L.H.,
8. Helms, J.A.,
9. Farnham, P.J.,
10. Segal, E.,
11. et al.
(2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129:1311–1323.
CrossRef Medline Google Scholar
↵
1. Rodriguez, A.,
2. Griffiths-Jones, S.,
3. Ashurst, J.L.,
4. Bradley, A.
(2004) Identification of mammalian microRNA host genes and transcription units. Genome Res. 14:1902–1910.
Abstract/FREE Full Text
↵
1. Roh, T.Y.,
2. Wei, G.,
3. Farrell, C.M.,
4. Zhao, K.
(2007) Genome-wide prediction of conserved and nonconserved enhancers by histone acetylation patterns. Genome Res. 17:74–81.
Abstract/FREE Full Text
↵
1. Romanish, M.T.,
2. Lock, W.M.,
3. de Lagemaat, L.N.,
4. Dunn, C.A.,
5. Mager, D.L.
(2007) Repeated recruitment of LTR retrotransposons as promoters by the anti-apoptotic locus NAIP during mammalian evolution. PLoS Genet. 3:e10, doi:10.1371/journal.pgen.0030010.
CrossRef Medline Google Scholar
↵
1. Roskin, K.M.,
2. Diekhans, M.,
3. Haussler, D.
1. Roskin, K.M.,
2. et al.
(2003) in Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology, Scoring two-species local alignments to try to statistically separate neutrally evolving from selected DNA segments, ed Roskin, K.M., et al. (ACM Press, New York), pp 257–266.
Google Scholar
↵
1. Sanchez-Elsner, T.,
2. Gou, D.,
3. Kremmer, E.,
4. Sauer, F.
(2006) Noncoding RNAs of trithorax response elements recruit Drosophila Ash1 to Ultrabithorax. Science 311:1118–1123.
Abstract/FREE Full Text
↵
1. Sanges, R.,
2. Kalmar, E.,
3. Claudiani, P.,
4. D’Amato, M.,
5. Muller, F.,
6. Stupka, E.
(2006) Shuffling of cis-regulatory elements is a pervasive feature of the vertebrate lineage. Genome Biol. 7:R56, doi:10.1186/gb-2006-7-7-r56.
CrossRef Medline Google Scholar
↵
1. Schattner, P.,
2. Diekhans, M.
(2006) Regions of extreme synonymous codon selection in mammalian genes. Nucleic Acids Res. 34:1700–1710, doi:10.1093/nar/gkl095.
Abstract/FREE Full Text
↵
1. Schmitt, S.,
2. Paro, R.
(2006) RNA at the steering wheel. Genome Biol. 7:218, doi:10.1186/gb-2006-7-5-218.
CrossRef Medline Google Scholar
↵
1. Sea Urchin Genome Sequencing Consortium
(2006) The genome of the sea urchin Strongylocentrotus purpuratus. Science 314:941–952.
Abstract/FREE Full Text
↵
1. Shankar, R.,
2. Grover, D.,
3. Brahmachari, S.K.,
4. Mukerji, M.
(2004) Evolution and distribution of RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Alu elements. BMC Evol. Biol. 4:37, doi:10.1186/1471-2148-4-37.
CrossRef Medline Google Scholar
↵
1. Shankar, R.,
2. Chaurasia, A.,
3. Ghosh, B.,
4. Chekmenev, D.,
5. Cheremushkin, E.,
6. Kel, A.,
7. Mukerji, M.
(2007) Non-random genomic divergence in repetitive sequences of human and chimpanzee in genes of different functional categories. Mol. Genet. Genomics 277:441–455.
CrossRef Medline Google Scholar
↵
1. Siepel, A.,
2. Bejerano, G.,
3. Pedersen, J.S.,
4. Hinrichs, A.S.,
5. Hou, M.,
6. Rosenbloom, K.,
7. Clawson, H.,
8. Spieth, J.,
9. Hillier, L.W.,
10. Richards, S.,
11. et al.
(2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15:1034–1050.
Abstract/FREE Full Text
↵
1. Silva, J.C.,
2. Shabalina, S.A.,
3. Harris, D.G.,
4. Spouge, J.L.,
5. Kondrashov, A.S.
(2003) Conserved fragments of transposable elements in intergenic regions: Evidence for widespread recruitment of MIR- and L2-derived sequences within the mouse and human genomes. Genet. Res. 82:1–18.
CrossRef Medline Google Scholar
↵
1. Simons, C.,
2. Pheasant, M.,
3. Makunin, I.V.,
4. Mattick, J.S.
(2006) Transposon-free regions in mammalian genomes. Genome Res. 16:164–172.
Abstract/FREE Full Text
↵
1. Smalheiser, N.R.,
2. Torvik, V.I.
(2005) Mammalian microRNAs derived from genomic repeats. Trends Genet. 21:322–326.
CrossRef Medline Google Scholar
↵
1. Smalheiser, N.R.,
2. Torvik, V.I.
(2006) Alu elements within human mRNAs are probable microRNA targets. Trends Genet. 22:532–536.
CrossRef Medline Google Scholar
↵
1. Smit, A.F.
(1999) Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9:657–663.
CrossRef Medline Google Scholar
↵
1. Smit, A.F.,
2. Riggs, A.D.
(1995) MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation. Nucleic Acids Res. 23:98–102.
Abstract/FREE Full Text
↵
1. Smith, N.G.,
2. Brandstrom, M.,
3. Ellegren, H.
(2004) Evidence for turnover of functional noncoding DNA in mammalian genome evolution. Genomics 84:806–813.
CrossRef Medline Google Scholar
↵
1. Stein, L.D.,
2. Bao, Z.,
3. Blasiar, D.,
4. Blumenthal, T.,
5. Brent, M.R.,
6. Chen, N.,
7. Chinwalla, A.,
8. Clarke, L.,
9. Clee, C.,
10. Coghlan, A.,
11. et al.
(2003) The genome sequence of Caenorhabditis briggsae: A platform for comparative genomics. PLoS Biol. 1:e45, doi:10.1371/journal.pbio.0000045.
CrossRef Medline Google Scholar
↵
1. Stone, E.A.,
2. Cooper, G.M.,
3. Sidow, A.
(2005) Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. Annu. Rev. Genomics Hum. Genet. 6:143–164.
CrossRef Medline Google Scholar
↵
1. Sun, H.,
2. Skogerbo, G.,
3. Chen, R.
(2006) Conserved distances between vertebrate highly conserved elements. Hum. Mol. Genet. 15:2911–2922.
Abstract/FREE Full Text
↵
1. Taft, R.J.,
2. Pheasant, M.,
3. Mattick, J.S.
(2007) The relationship between non-protein-coding DNA and eukaryotic complexity. Bioessays 29:288–299.
CrossRef Medline Google Scholar
↵
1. Taylor, M.S.,
2. Kai, C.,
3. Kawai, J.,
4. Carninci, P.,
5. Hayashizaki, Y.,
6. Semple, C.A.
(2006) Heterotachy in mammalian promoter evolution. PLoS Genet. 2:e30, doi:10.1371/journal.pgen.0020030.
CrossRef Medline Google Scholar
↵
1. Temin, H.M.
(1982) Function of the retrovirus long terminal repeat. Cell 28:3–5.
CrossRef Medline Google Scholar
↵
1. Thornburg, B.G.,
2. Gotea, V.,
3. Makalowski, W.
(2006) Transposable elements as a significant source of transcription regulating signals. Gene 365:104–110.
CrossRef Medline Google Scholar
↵
1. Volff, J.N.
(2006) Turning junk into gold: Domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays 28:913–922.
CrossRef Medline Google Scholar
↵
1. Washietl, S.,
2. Hofacker, I.L.,
3. Lukasser, M.,
4. Huttenhofer, A.,
5. Stadler, P.F.
(2005) Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. 23:1383–1390.
CrossRef Medline Google Scholar
↵
1. Waterston, R.H.,
2. Lindblad-Toh, K.,
3. Birney, E.,
4. Rogers, J.,
5. Abril, J.F.,
6. Agarwal, P.,
7. Agarwala, R.,
8. Ainscough, R.,
9. Alexandersson, M.,
10. An, P.,
11. et al.
(2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.
CrossRef Medline Google Scholar
↵
1. Wheelan, S.J.,
2. Aizawa, Y.,
3. Han, J.S.,
4. Boeke, J.D.
(2005) Gene-breaking: A new paradigm for human retrotransposon-mediated gene evolution. Genome Res. 15:1073–1078.
Abstract/FREE Full Text
↵
1. Whitelaw, E.,
2. Martin, D.I.
(2001) Retrotransposons as epigenetic mediators of phenotypic variation in mammals. Nat. Genet. 27:361–365.
CrossRef Medline Google Scholar
↵
1. Willingham, A.T.,
2. Orth, A.P.,
3. Batalov, S.,
4. Peters, E.C.,
5. Wen, B.G.,
6. Aza-Blanc, P.,
7. Hogenesch, J.B.,
8. Schultz, P.G.
(2005) A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 309:1570–1573.
Abstract/FREE Full Text
↵
1. Xing, Y.,
2. Lee, C.
(2005) Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences. Proc. Natl. Acad. Sci. 102:13526–13531.
Abstract/FREE Full Text
↵
1. Xing, Y.,
2. Lee, C.
(2006) Alternative splicing and RNA selection pressure—Evolutionary consequences for eukaryotic genomes. Nat. Rev. Genet. 7:499–509.
CrossRef Medline Google Scholar
↵
1. Yi, P.,
2. Zhang, W.,
3. Zhai, Z.,
4. Miao, L.,
5. Wang, Y.,
6. Wu, M.
(2003) Bcl-rambo beta, a special splicing variant with an insertion of an Alu-like cassette, promotes etoposide- and taxol-induced cell death. FEBS Lett. 534:61–68.
CrossRef Medline Google Scholar
↵
1. Zhang, X.H.,
2. Chasin, L.A.
(2006) Comparison of multiple vertebrate genomes reveals the birth and evolution of human exons. Proc. Natl. Acad. Sci. 103:13427–13432.
Abstract/FREE Full Text
↵
1. Zhang, R.,
2. Peng, Y.,
3. Wang, W.,
4. Su, B.
(2007) Rapid evolution of an X-linked microRNA cluster in primates. Genome Res. 17:612–617.
Abstract/FREE Full Text
↵
1. Zhou, Y.H.,
2. Zheng, J.B.,
3. Gu, X.,
4. Saunders, G.F.,
5. Yung, W.K.
(2002) Novel PAX6 binding sites in the human genome and the role of repetitive elements in the evolution of gene regulation. Genome Res. 12:1716–1722.
Abstract/FREE Full Text
↵
1. Zuckerkandl, E.
(1992) Revisiting junk DNA. J. Mol. Evol. 34:259–271.
CrossRef Medline Google Scholar
↵
1. Zuckerkandl, E.,
2. Cavalli, G.
(2007) Combinatorial epigenetics, “junk DNA,” and the evolution of complex organisms. Gene 390:232–242.
CrossRef Medline Google Scholar

[1] ↵

Andersen, A.A.,

Panning, B.

(2003) Epigenetic gene regulation by noncoding RNAs. Curr. Opin. Cell Biol. 15:281–289.

CrossRef Medline Google Scholar

[2] Andersen, A.A.,

[3] Panning, B.

[4] ↵

Aravin, A.,

Gaidatzis, D.,

Pfeffer, S.,

Lagos-Quintana, M.,

Landgraf, P.,

Iovino, N.,

Morris, P.,

Brownstein, M.J.,

Kuramochi-Miyagawa, S.,

Nakano, T.,

et al.

(2006) A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442:203–207.

Medline Google Scholar

[5] Aravin, A.,

[6] Gaidatzis, D.,

[7] Pfeffer, S.,

[8] Lagos-Quintana, M.,

[9] Landgraf, P.,

[10] Iovino, N.,

[11] Morris, P.,

[12] Brownstein, M.J.,

[13] Kuramochi-Miyagawa, S.,

[14] Nakano, T.,

[15] et al.

[16] Asthana, S.,

Noble, W.S.,

Kryukov, G.,

Grant, C.E.,

Sunyaev, S.,

Stamatoyannopoulos, J.A.

(2007) Widely distributed noncoding purifying selection in the human genome. Proc. Natl. Acad. Sci. 104:12410–12415.

Abstract/FREE Full Text

[17] Asthana, S.,

[18] Noble, W.S.,

[19] Kryukov, G.,

[20] Grant, C.E.,

[21] Sunyaev, S.,

[22] Stamatoyannopoulos, J.A.

[23] ↵

Athanasiadis, A.,

Rich, A.,

Maas, S.

(2004) Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol. 2:e391, doi:10.1371/journal.pbio.0020391.

CrossRef Medline Google Scholar

[24] Athanasiadis, A.,

[25] Rich, A.,

[26] Maas, S.

[27] ↵

Baltimore, D.

(1985) Retroviruses and retrotransposons: The role of reverse transcription in shaping the eukaryotic genome. Cell 40:481–482.

CrossRef Medline Google Scholar

[28] Baltimore, D.

[29] ↵

Bartel, D.

(2004) MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116:281–297.

CrossRef Medline Google Scholar

[30] Bartel, D.

[31] ↵

Baskerville, S.,

Bartel, D.P.

(2005) Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11:241–247.

Abstract/FREE Full Text

[32] Baskerville, S.,

[33] Bartel, D.P.

[34] ↵

Bejerano, G.,

Lowe, C.B.,

Ahituv, N.,

King, B.,

Siepel, A.,

Salama, S.R.,

Rubin, E.M.,

Kent, W.J.,

Haussler, D.

(2006) A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature 441:87–90.

CrossRef Medline Google Scholar

[35] Bejerano, G.,

[36] Lowe, C.B.,

[37] Ahituv, N.,

[38] King, B.,

[39] Siepel, A.,

[40] Salama, S.R.,

[41] Rubin, E.M.,

[42] Kent, W.J.,

[43] Haussler, D.

[44] ↵

Berezikov, E.,

Thuemmler, F.,

van Laake, L.,

Kondova, I.,

Bontrop, R.,

Cuppen, E.,

Plasterk, R.H.A.

(2006a) Diversity of microRNAs in human and chimpanzee brain. Nat. Genet. 38:1375–1377.

CrossRef Medline Google Scholar

[45] Berezikov, E.,

[46] Thuemmler, F.,

[47] van Laake, L.,

[48] Kondova, I.,

[49] Bontrop, R.,

[50] Cuppen, E.,

[51] Plasterk, R.H.A.

[52] ↵

Berezikov, E.,

van Tetering, G.,

Verheul, M.,

de van Belt, J.,

van Laake, L.,

Vos, J.,

Verloop, R.,

de van Wetering, M.,

Guryev, V.,

Takada, S.,

et al.

(2006b) Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res. 16:1289–1298.

Abstract/FREE Full Text

[53] Berezikov, E.,

[54] van Tetering, G.,

[55] Verheul, M.,

[56] de van Belt, J.,

[57] van Laake, L.,

[58] Vos, J.,

[59] Verloop, R.,

[60] de van Wetering, M.,

[61] Guryev, V.,

[62] Takada, S.,

[63] et al.

[64] ↵

Bernstein, E.,

Allis, C.D.

(2005) RNA meets chromatin. Genes & Dev. 19:1635–1655.

Abstract/FREE Full Text

[65] Bernstein, E.,

[66] Allis, C.D.

[67] ↵

Blow, M.,

Futreal, P.A.,

Wooster, R.,

Stratton, M.R.

(2004) A survey of RNA editing in human brain. Genome Res. 14:2379–2387.

Abstract/FREE Full Text

[68] Blow, M.,

[69] Futreal, P.A.,

[70] Wooster, R.,

[71] Stratton, M.R.

[72] ↵

Bollenbach, T.,

Vetsigian, K.,

Kishony, R.

(2007) Evolution and multilevel optimization of the genetic code. Genome Res. 17:401–404.

Abstract/FREE Full Text

[73] Bollenbach, T.,

[74] Vetsigian, K.,

[75] Kishony, R.

[76] ↵

Brandt, J.,

Schrauth, S.,

Veith, A.M.,

Froschauer, A.,

Haneke, T.,

Schultheis, C.,

Gessler, M.,

Leimeister, C.,

Volff, J.N.

(2005a) Transposable elements as a source of genetic innovation: Expression and evolution of a family of retrotransposon-derived neogenes in mammals. Gene 345:101–111.

CrossRef Medline Google Scholar

[77] Brandt, J.,

[78] Schrauth, S.,

[79] Veith, A.M.,

[80] Froschauer, A.,

[81] Haneke, T.,

[82] Schultheis, C.,

[83] Gessler, M.,

[84] Leimeister, C.,

[85] Volff, J.N.

[86] ↵

Brandt, J.,

Veith, A.M.,

Volff, J.N.

(2005b) A family of neofunctionalized Ty3/gypsy retrotransposon genes in mammalian genomes. Cytogenet. Genome Res. 110:307–317.

CrossRef Medline Google Scholar

[87] Brandt, J.,

[88] Veith, A.M.,

[89] Volff, J.N.

[90] ↵

Breznik, T.,

Traina-Dorge, V.,

Gama-Sosa, M.,

Gehrke, C.W.,

Ehrlich, M.,

Medina, D.,

Butel, J.S.,

Cohen, J.C.

(1984) Mouse mammary tumor virus DNA methylation: Tissue-specific variation. Virology 136:69–77.

CrossRef Medline Google Scholar

[91] Breznik, T.,

[92] Traina-Dorge, V.,

[93] Gama-Sosa, M.,

[94] Gehrke, C.W.,

[95] Ehrlich, M.,

[96] Medina, D.,

[97] Butel, J.S.,

[98] Cohen, J.C.

[99] ↵

Britten, R.

(2006) Transposable elements have contributed to thousands of human proteins. Proc. Natl. Acad. Sci. 103:1798–1803.

Abstract/FREE Full Text

[100] Britten, R.

[101] ↵

Britten, R.J.,

Davidson, E.H.

(1969) Gene regulation for higher cells: A theory. Science 165:349–357.

FREE Full Text

[102] Britten, R.J.,

[103] Davidson, E.H.

[104] ↵

Brosius, J.

(1999) RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238:115–134.

CrossRef Medline Google Scholar

[105] Brosius, J.

[106] ↵

Brosius, J.

(2005) Waste not, want not—Transcript excess in multicellular eukaryotes. Trends Genet. 21:287–288.

CrossRef Medline Google Scholar

[107] Brosius, J.

[108] ↵

Bustamante, C.D.,

Fledel-Alon, A.,

Williamson, S.,

Nielsen, R.,

Hubisz, M.T.,

Glanowski, S.,

Tanenbaum, D.M.,

White, T.J.,

Sninsky, J.J.,

Hernandez, R.D.,

et al.

(2005) Natural selection on protein-coding genes in the human genome. Nature 437:1153–1157.

CrossRef Medline Google Scholar

[109] Bustamante, C.D.,

[110] Fledel-Alon, A.,

[111] Williamson, S.,

[112] Nielsen, R.,

[113] Hubisz, M.T.,

[114] Glanowski, S.,

[115] Tanenbaum, D.M.,

[116] White, T.J.,

[117] Sninsky, J.J.,

[118] Hernandez, R.D.,

[119] et al.

[120] ↵

Carninci, P.

(2007) Constructing the landscape of the mammalian transcriptome. J. Exp. Biol. 210:1497–1506.

Abstract/FREE Full Text

[121] Carninci, P.

[122] ↵

Carninci, P.,

Kasukawa, T.,

Katayama, S.,

Gough, J.,

Frith, M.C.,

Maeda, N.,

Oyama, R.,

Ravasi, T.,

Lenhard, B.,

Wells, C.,

et al.

(2005) The transcriptional landscape of the mammalian genome. Science 309:1559–1563.

Abstract/FREE Full Text

[123] Carninci, P.,

[124] Kasukawa, T.,

[125] Katayama, S.,

[126] Gough, J.,

[127] Frith, M.C.,

[128] Maeda, N.,

[129] Oyama, R.,

[130] Ravasi, T.,

[131] Lenhard, B.,

[132] Wells, C.,

[133] et al.

[134] ↵

Chalitchagorn, K.,

Shuangshoti, S.,

Hourpai, N.,

Kongruttanachok, N.,

Tangkijvanich, P.,

Thong-ngam, D.,

Voravud, N.,

Sriuranpong, V.,

Mutirangura, A.

(2004) Distinctive pattern of LINE-1 methylation level in normal tissues and the association with carcinogenesis. Oncogene 23:8841–8846.

CrossRef Medline Google Scholar

[135] Chalitchagorn, K.,

[136] Shuangshoti, S.,

[137] Hourpai, N.,

[138] Kongruttanachok, N.,

[139] Tangkijvanich, P.,

[140] Thong-ngam, D.,

[141] Voravud, N.,

[142] Sriuranpong, V.,

[143] Mutirangura, A.

[144] ↵

Chamary, J.V.,

Parmley, J.L.,

Hurst, L.D.

(2006) Hearing silence: Non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet. 7:98–108.

CrossRef Medline Google Scholar

[145] Chamary, J.V.,

[146] Parmley, J.L.,

[147] Hurst, L.D.

[148] ↵

Cheng, J.,

Kapranov, P.,

Drenkow, J.,

Dike, S.,

Brubaker, S.,

Patel, S.,

Long, J.,

Stern, D.,

Tammana, H.,

Helt, G.,

et al.

(2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308:1149–1154.

Abstract/FREE Full Text

[149] Cheng, J.,

[150] Kapranov, P.,

[151] Drenkow, J.,

[152] Dike, S.,

[153] Brubaker, S.,

[154] Patel, S.,

[155] Long, J.,

[156] Stern, D.,

[157] Tammana, H.,

[158] Helt, G.,

[159] et al.

[160] ↵

Chiaromonte, F.,

Weber, R.J.,

Roskin, K.M.,

Diekhans, M.,

Kent, W.J.,

Haussler, D.

(2003) The share of human genomic DNA under selection estimated from human–mouse genomic alignments. Cold Spring Harb. Symp. Quant. Biol. 68:245–254.

CrossRef Medline Google Scholar

[161] Chiaromonte, F.,

[162] Weber, R.J.,

[163] Roskin, K.M.,

[164] Diekhans, M.,

[165] Kent, W.J.,

[166] Haussler, D.

[167] ↵

Cooper, G.M.,

Brudno, M.,

Green, E.D.,

Batzoglou, S.,

Sidow, A.,

NISC Comparative Sequencing Program.

(2003) Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res. 13:813–820.

Abstract/FREE Full Text

[168] Cooper, G.M.,

[169] Brudno, M.,

[170] Green, E.D.,

[171] Batzoglou, S.,

[172] Sidow, A.,

[173] NISC Comparative Sequencing Program.

[174] ↵

Cooper, G.M.,

Brudno, M.,

Stone, E.A.,

Dubchak, I.,

Batzoglou, S.,

Sidow, A.

(2004) Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 14:539–548.

Abstract/FREE Full Text

[175] Cooper, G.M.,

[176] Brudno, M.,

[177] Stone, E.A.,

[178] Dubchak, I.,

[179] Batzoglou, S.,

[180] Sidow, A.

[181] ↵

Cooper, G.M.,

Stone, E.A.,

Asimenos, G.,

Green, E.D.,

Batzoglou, S.,

Sidow, A.

(2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15:901–913.

Abstract/FREE Full Text

[182] Cooper, G.M.,

[183] Stone, E.A.,

[184] Asimenos, G.,

[185] Green, E.D.,

[186] Batzoglou, S.,

[187] Sidow, A.

[188] ↵

Cordaux, R.,

Batzer, M.A.

(2006) Teaching an old dog new tricks: SINEs of canine genomic diversity. Proc. Natl. Acad. Sci. 103:1157–1158.

FREE Full Text

[189] Cordaux, R.,

[190] Batzer, M.A.

[191] Cordaux, R.,

Udit, S.,

Batzer, M.A.,

Feschotte, C.

(2006) Birth of a chimeric primate gene by capture of the transposase gene from a mobile element. Proc. Natl. Acad. Sci. 103:8101–8106.

Abstract/FREE Full Text

[192] Cordaux, R.,

[193] Udit, S.,

[194] Batzer, M.A.,

[195] Feschotte, C.

[196] ↵

Dagan, T.,

Sorek, R.,

Sharon, E.,

Ast, G.,

Graur, D.

(2004) AluGene: A database of Alu elements incorporated within protein-coding genes. Nucleic Acids Res. 32:D489–D492, doi:10.1093/nar/gkh132.

Abstract/FREE Full Text

[197] Dagan, T.,

[198] Sorek, R.,

[199] Sharon, E.,

[200] Ast, G.,

[201] Graur, D.

[202] ↵

Davidson, E.H.,

Britten, R.J.

(1979) Regulation of gene expression: Possible role of repetitive sequences. Science 204:1052–1059.

Abstract/FREE Full Text

[203] Davidson, E.H.,

[204] Britten, R.J.

[205] ↵

Davidson, E.H.,

Posakony, J.W.

(1982) Repetitive sequence transcripts in development. Nature 297:633–635.

CrossRef Medline Google Scholar

[206] Davidson, E.H.,

[207] Posakony, J.W.

[208] ↵

Eisen, J.A.,

Coyne, R.S.,

Wu, M.,

Wu, D.,

Thiagarajan, M.,

Wortman, J.R.,

Badger, J.H.,

Ren, Q.,

Amedeo, P.,

Jones, K.M.,

et al.

(2006) Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. PLoS Biol. 4:e286, doi:10.1371/journal.pbio.0040286.

CrossRef Medline Google Scholar

[209] Eisen, J.A.,

[210] Coyne, R.S.,

[211] Wu, M.,

[212] Wu, D.,

[213] Thiagarajan, M.,

[214] Wortman, J.R.,

[215] Badger, J.H.,

[216] Ren, Q.,

[217] Amedeo, P.,

[218] Jones, K.M.,

[219] et al.

[220] ↵

The ENCODE Project Consortium

(2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799–816.

CrossRef Medline Google Scholar

[221] The ENCODE Project Consortium

[222] ↵

Ferrigno, O.,

Virolle, T.,

Djabari, Z.,

Ortonne, J.P.,

White, R.J.,

Aberdam, D.

(2001) Transposable B2 SINE elements can provide mobile RNA polymerase II promoters. Nat. Genet. 28:77–81.

CrossRef Medline Google Scholar

[223] Ferrigno, O.,

[224] Virolle, T.,

[225] Djabari, Z.,

[226] Ortonne, J.P.,

[227] White, R.J.,

[228] Aberdam, D.

[229] ↵

Finnegan, D.J.

(1989) Eukaryotic transposable elements and genome evolution. Trends Genet. 5:103–107.

CrossRef Medline Google Scholar

[230] Finnegan, D.J.

[231] ↵

Fisher, S.,

Grice, E.A.,

Vinton, R.M.,

Bessling, S.L.,

McCallion, A.S.

(2006) Conservation of RET regulatory function from human to zebrafish without sequence similarity. Science 312:276–279.

Abstract/FREE Full Text

[232] Fisher, S.,

[233] Grice, E.A.,

[234] Vinton, R.M.,

[235] Bessling, S.L.,

[236] McCallion, A.S.

[237] ↵

Frazer, K.A.,

Tao, H.,

Osoegawa, K.,

de Jong, P.J.,

Chen, X.,

Doherty, M.F.,

Cox, D.R.

(2004) Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14:367–372.

Abstract/FREE Full Text

[238] Frazer, K.A.,

[239] Tao, H.,

[240] Osoegawa, K.,

[241] de Jong, P.J.,

[242] Chen, X.,

[243] Doherty, M.F.,

[244] Cox, D.R.

[245] ↵

Frith, M.C.,

Pheasant, M.,

Mattick, J.S.

(2005) The amazing complexity of the human transcriptome. Eur. J. Hum. Genet. 13:894–897.

CrossRef Medline Google Scholar

[246] Frith, M.C.,

[247] Pheasant, M.,

[248] Mattick, J.S.

[249] ↵

Frith, M.C.,

Ponjavic, J.,

Fredman, D.,

Kai, C.,

Kawai, J.,

Carninci, P.,

Hayshizaki, Y.,

Sandelin, A.

(2006) Evolutionary turnover of mammalian transcription start sites. Genome Res. 16:713–722.

Abstract/FREE Full Text

[250] Frith, M.C.,

[251] Ponjavic, J.,

[252] Fredman, D.,

[253] Kai, C.,

[254] Kawai, J.,

[255] Carninci, P.,

[256] Hayshizaki, Y.,

[257] Sandelin, A.

[258] ↵

Gaffney, D.J.,

Keightley, P.D.

(2006) Genomic selective constraints in murid noncoding DNA. PLoS Genet. 2:e204, doi:10.1371/journal.pgen.0020204.

CrossRef Medline Google Scholar

[259] Gaffney, D.J.,

[260] Keightley, P.D.

[261] ↵

Ganapathi, M.,

Srivastava, P.,

Das Sutar, S.K.,

Kumar, K.,

Dasgupta, D.,

Pal Singh, G.,

Brahmachari, V.,

Brahmachari, S.K.

(2005) Comparative analysis of chromatin landscape in regulatory regions of human housekeeping and tissue specific genes. BMC Bioinformatics 6:126, doi:10.1186/1471-2105-6-126.

CrossRef Medline Google Scholar

[262] Ganapathi, M.,

[263] Srivastava, P.,

[264] Das Sutar, S.K.,

[265] Kumar, K.,

[266] Dasgupta, D.,

[267] Pal Singh, G.,

[268] Brahmachari, V.,

[269] Brahmachari, S.K.

[270] ↵

Gerstein, M.B.,

Bruce, C.,

Rozowsky, J.S.,

Zheng, D.,

Du, J.,

Korbel, J.O.,

Emanuelsson, O.,

Zhang, Z.D.,

Weissman, S.,

Snyder, M.

(2007) What is a gene, post-ENCODE? History and updated definition. Genome Res. 17:669–681.

Abstract/FREE Full Text

[271] Gerstein, M.B.,

[272] Bruce, C.,

[273] Rozowsky, J.S.,

[274] Zheng, D.,

[275] Du, J.,

[276] Korbel, J.O.,

[277] Emanuelsson, O.,

[278] Zhang, Z.D.,

[279] Weissman, S.,

[280] Snyder, M.

[281] ↵

Ginger, M.R.,

Shore, A.N.,

Contreras, A.,

Rijnkels, M.,

Miller, J.,

Gonzalez-Rimbau, M.F.,

Rosen, J.M.

(2006) A noncoding RNA is a potential marker of cell fate during mammary gland development. Proc. Natl. Acad. Sci. 103:5781–5786.

Abstract/FREE Full Text

[282] Ginger, M.R.,

[283] Shore, A.N.,

[284] Contreras, A.,

[285] Rijnkels, M.,

[286] Miller, J.,

[287] Gonzalez-Rimbau, M.F.,

[288] Rosen, J.M.

[289] ↵

Gingeras, T.R.

(2007) Origin of phenotypes: Genes and transcripts. Genome Res. 17:682–690.

Abstract/FREE Full Text

[290] Gingeras, T.R.

[291] ↵

Girard, A.,

Sachidanandam, R.,

Hannon, G.J.,

Carmell, M.A.

(2006) A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature 442:199–202.

CrossRef Medline Google Scholar

[292] Girard, A.,

[293] Sachidanandam, R.,

[294] Hannon, G.J.,

[295] Carmell, M.A.

[296] ↵

Goodrich, J.A.,

Kugel, J.F.

(2006) Non-coding-RNA regulators of RNA polymerase II transcription. Nat. Rev. Mol. Cell Biol. 7:612–616.

CrossRef Medline Google Scholar

[297] Goodrich, J.A.,

[298] Kugel, J.F.

[299] ↵

Goodstadt, L.,

Ponting, C.P.

(2006) Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comp. Biol. 2:e133, doi:10.1371/journal.pcbi.0020133.

CrossRef Medline Google Scholar

[300] Goodstadt, L.,

[301] Ponting, C.P.

[302] ↵

Grover, D.,

Kannan, K.,

Brahmachari, S.K.,

Mukerji, M.

(2005) ALU-ring elements in the primate genomes. Genetica 124:273–289.

CrossRef Medline Google Scholar

[303] Grover, D.,

[304] Kannan, K.,

[305] Brahmachari, S.K.,

[306] Mukerji, M.

[307] ↵

Hardman, N.

(1986) Structure and function of repetitive DNA in eukaryotes. Biochem. J. 234:1–11.

FREE Full Text

[308] Hardman, N.

[309] ↵

Hasler, J.,

Strub, K.

(2006a) Alu elements as regulators of gene expression. Nucleic Acids Res. 34:5491–5497, doi:10.1093/nar/gkl706.

Abstract/FREE Full Text

[310] Hasler, J.,

[311] Strub, K.

[312] ↵

Hasler, J.,

Strub, K.

(2006b) Alu RNP and Alu RNA regulate translation initiation in vitro. Nucleic Acids Res. 34:2374–2385, doi:10.1093/nar/gkl246.

Abstract/FREE Full Text

[313] Hasler, J.,

[314] Strub, K.

[315] ↵

Hellmann, I.,

Zollner, S.,

Enard, W.,

Ebersberger, I.,

Nickel, B.,

Paabo, S.

(2003) Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 13:831–837.

Abstract/FREE Full Text

[316] Hellmann, I.,

[317] Zollner, S.,

[318] Enard, W.,

[319] Ebersberger, I.,

[320] Nickel, B.,

[321] Paabo, S.

[322] ↵

Hertel, J.,

Lindemeyer, M.,

Missal, K.,

Fried, C.,

Tanzer, A.,

Flamm, C.,

Hofacker, I.L.,

Stadler, P.F.

(2006) The expansion of the metazoan microRNA repertoire. BMC Genomics 7:25, doi:10.1186/1471-2164-7-25.

CrossRef Medline Google Scholar

[323] Hertel, J.,

[324] Lindemeyer, M.,

[325] Missal, K.,

[326] Fried, C.,

[327] Tanzer, A.,

[328] Flamm, C.,

[329] Hofacker, I.L.,

[330] Stadler, P.F.

[331] ↵

International Human Genome Sequencing Consortium

(2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945.

CrossRef Medline Google Scholar

[332] International Human Genome Sequencing Consortium

[333] ↵

Ishii, N.,

Ozaki, K.,

Sato, H.,

Mizuno, H.,

Saito, S.,

Takahashi, A.,

Miyamoto, Y.,

Ikegawa, S.,

Kamatani, N.,

Hori, M.,

et al.

(2006) Identification of a novel non-coding RNA, MIAT, that confers risk of myocardial infarction. J. Hum. Genet. 51:1087–1099.

CrossRef Medline Google Scholar

[334] Ishii, N.,

[335] Ozaki, K.,

[336] Sato, H.,

[337] Mizuno, H.,

[338] Saito, S.,

[339] Takahashi, A.,

[340] Miyamoto, Y.,

[341] Ikegawa, S.,

[342] Kamatani, N.,

[343] Hori, M.,

[344] et al.

[345] ↵

Itzkovitz, S.,

Alon, U.

(2007) The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Res. 17:405–412.

Abstract/FREE Full Text

[346] Itzkovitz, S.,

[347] Alon, U.

[348] ↵

Janowski, B.A.,

Huffman, K.E.,

Schwartz, J.C.,

Ram, R.,

Hardy, D.,

Shames, D.S.,

Minna, J.D.,

Corey, D.R.

(2005) Inhibiting gene expression at transcription start sites in chromosomal DNA with antigene RNAs. Nat. Chem. Biol. 1:216–222.

CrossRef Medline Google Scholar

[349] Janowski, B.A.,

[350] Huffman, K.E.,

[351] Schwartz, J.C.,

[352] Ram, R.,

[353] Hardy, D.,

[354] Shames, D.S.,

[355] Minna, J.D.,

[356] Corey, D.R.

[357] ↵

Jelinek, W.R.,

Toomey, T.P.,

Leinwand, L.,

Duncan, C.H.,

Biro, P.A.,

Choudary, P.V.,

Weissman, S.M.,

Rubin, C.M.,

Houck, C.M.,

Deininger, P.L.,

et al.

(1980) Ubiquitous, interspersed repeated sequences in mammalian genomes. Proc. Natl. Acad. Sci. 77:1398–1402.

Abstract/FREE Full Text

[358] Jelinek, W.R.,

[359] Toomey, T.P.,

[360] Leinwand, L.,

[361] Duncan, C.H.,

[362] Biro, P.A.,

[363] Choudary, P.V.,

[364] Weissman, S.M.,

[365] Rubin, C.M.,

[366] Houck, C.M.,

[367] Deininger, P.L.,

[368] et al.

[369] ↵

Johnson, R.,

Gamblin, R.J.,

Ooi, L.,

Bruce, A.W.,

Donaldson, I.J.,

Westhead, D.R.,

Wood, I.C.,

Jackson, R.M.,

Buckley, N.J.

(2006) Identification of the REST regulon reveals extensive transposable element-mediated binding site duplication. Nucleic Acids Res. 34:3862–3877, doi:10.1093/nar/gkl525.

Abstract/FREE Full Text

[370] Johnson, R.,

[371] Gamblin, R.J.,

[372] Ooi, L.,

[373] Bruce, A.W.,

[374] Donaldson, I.J.,

[375] Westhead, D.R.,

[376] Wood, I.C.,

[377] Jackson, R.M.,

[378] Buckley, N.J.

[379] ↵

Jordan, I.K.,

Rogozin, I.B.,

Glazko, G.V.,

Koonin, E.V.

(2003) Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 19:68–72.

CrossRef Medline Google Scholar

[380] Jordan, I.K.,

[381] Rogozin, I.B.,

[382] Glazko, G.V.,

[383] Koonin, E.V.

[384] ↵

Kamal, M.,

Xie, X.,

Lander, E.S.

(2006) A large family of ancient repeat elements in the human genome is under strong selection. Proc. Natl. Acad. Sci. 103:2740–2745.

Abstract/FREE Full Text

[385] Kamal, M.,

[386] Xie, X.,

[387] Lander, E.S.

[388] ↵

Katayama, S.,

Tomaru, Y.,

Kasukawa, T.,

Waki, K.,

Nakanishi, M.,

Nakamura, M.,

Nishida, H.,

Yap, C.C.,

Suzuki, M.,

Kawai, J.,

et al.

(2005) Antisense transcription in the mammalian transcriptome. Science 309:1564–1566.

Abstract/FREE Full Text

[389] Katayama, S.,

[390] Tomaru, Y.,

[391] Kasukawa, T.,

[392] Waki, K.,

[393] Nakanishi, M.,

[394] Nakamura, M.,

[395] Nishida, H.,

[396] Yap, C.C.,

[397] Suzuki, M.,

[398] Kawai, J.,

[399] et al.

[400] ↵

Khodosevich, K.,

Lebedev, Y.,

Sverdlov, E.D.

(2004) Large-scale determination of the methylation status of retrotransposons in different tissues using a methylation tags approach. Nucleic Acids Res. 32:e31, doi:10.1093/nar/gnh035.

Abstract/FREE Full Text

[401] Khodosevich, K.,

[402] Lebedev, Y.,

[403] Sverdlov, E.D.

[404] ↵

Kim, D.D.,

Kim, T.T.,

Walsh, T.,

Kobayashi, Y.,

Matise, T.C.,

Buyske, S.,

Gabriel, A.

(2004) Widespread RNA editing of embedded Alu elements in the human transcriptome. Genome Res. 14:1719–1725.

Abstract/FREE Full Text

[405] Kim, D.D.,

[406] Kim, T.T.,

[407] Walsh, T.,

[408] Kobayashi, Y.,

[409] Matise, T.C.,

[410] Buyske, S.,

[411] Gabriel, A.

[412] ↵

Kim, D.H.,

Villeneuve, L.M.,

Morris, K.V.,

Rossi, J.J.

(2006) Argonaute-1 directs siRNA-mediated transcriptional gene silencing in human cells. Nat. Struct. Mol. Biol. 13:793–797.

CrossRef Medline Google Scholar

[413] Kim, D.H.,

[414] Villeneuve, L.M.,

[415] Morris, K.V.,

[416] Rossi, J.J.

[417] ↵

Kimchi-Sarfaty, C.,

Oh, J.M.,

Kim, I.,

Sauna, Z.E.,

Calcagno, A.M.,

Ambudkar, S.V.,

Gottesman, M.M.

(2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315:525–528.

Abstract/FREE Full Text

[418] Kimchi-Sarfaty, C.,

[419] Oh, J.M.,

[420] Kim, I.,

[421] Sauna, Z.E.,

[422] Calcagno, A.M.,

[423] Ambudkar, S.V.,

[424] Gottesman, M.M.

[425] ↵

Kimura, M.

(1968) Evolutionary rate at the molecular level. Nature 217:624–626.

CrossRef Medline Google Scholar

[426] Kimura, M.

[427] ↵

Komar, A.A.

(2007) SNPs, silent but not invisible. Science 315:466–467.

Abstract/FREE Full Text

[428] Komar, A.A.

[429] ↵

Krull, M.,

Brosius, J.,

Schmitz, J.

(2005) Alu-SINE exonization: En route to protein-coding function. Mol. Biol. Evol. 22:1702–1711.

Abstract/FREE Full Text

[430] Krull, M.,

[431] Brosius, J.,

[432] Schmitz, J.

[433] ↵

Kuryshev, V.Y.,

Skryabin, B.V.,

Kremerskothen, J.,

Jurka, J.,

Brosius, J.

(2001) Birth of a gene: Locus of neuronal BC200 snmRNA in three prosimians and human BC200 pseudogenes as archives of change in the Anthropoidea lineage. J. Mol. Biol. 309:1049–1066.

CrossRef Medline Google Scholar

[434] Kuryshev, V.Y.,

[435] Skryabin, B.V.,

[436] Kremerskothen, J.,

[437] Jurka, J.,

[438] Brosius, J.

[439] ↵

Landry, J.R.,

Medstrand, P.,

Mager, D.L.

(2001) Repetitive elements in the 5′ untranslated region of a human zinc-finger gene modulate transcription and translation efficiency. Genomics 76:110–116.

CrossRef Medline Google Scholar

[440] Landry, J.R.,

[441] Medstrand, P.,

[442] Mager, D.L.

[443] ↵

Lau, N.C.,

Seto, A.G.,

Kim, J.,

Kuramochi-Miyagawa, S.,

Nakano, T.,

Bartel, D.P.,

Kingston, R.E.

(2006) Characterization of the piRNA complex from rat testes. Science 313:363–367.

Abstract/FREE Full Text

[444] Lau, N.C.,

[445] Seto, A.G.,

[446] Kim, J.,

[447] Kuramochi-Miyagawa, S.,

[448] Nakano, T.,

[449] Bartel, D.P.,

[450] Kingston, R.E.

[451] ↵

Lescoute, A.,

Leontis, N.B.,

Massire, C.,

Westhof, E.

(2005) Recurrent structural RNA motifs, isostericity matrices and sequence alignments. Nucleic Acids Res. 33:2395–2409, doi:10.1093/nar/gki535.

Abstract/FREE Full Text

[452] Lescoute, A.,

[453] Leontis, N.B.,

[454] Massire, C.,

[455] Westhof, E.

[456] ↵

Levanon, E.Y.,

Eisenberg, E.,

Yelin, R.,

Nemzer, S.,

Hallegger, M.,

Shemesh, R.,

Fligelman, Z.Y.,

Shoshan, A.,

Pollock, S.R.,

Sztybel, D.,

et al.

(2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22:1001–1005.

CrossRef Medline Google Scholar

[457] Levanon, E.Y.,

[458] Eisenberg, E.,

[459] Yelin, R.,

[460] Nemzer, S.,

[461] Hallegger, M.,

[462] Shemesh, R.,

[463] Fligelman, Z.Y.,

[464] Shoshan, A.,

[465] Pollock, S.R.,

[466] Sztybel, D.,

[467] et al.

[468] ↵

Lev-Maor, G.,

Sorek, R.,

Shomron, N.,

Ast, G.

(2003) The birth of an alternatively spliced exon: 3′ Splice-site selection in Alu exons. Science 300:1288–1291.

Abstract/FREE Full Text

[469] Lev-Maor, G.,

[470] Sorek, R.,

[471] Shomron, N.,

[472] Ast, G.

[473] ↵

Li, L.C.,

Okino, S.T.,

Zhao, H.,

Pookot, D.,

Place, R.F.,

Urakami, S.,

Enokida, H.,

Dahiya, R.

(2006) Small dsRNAs induce transcriptional activation in human cells. Proc. Natl. Acad. Sci. 103:17337–17342.

Abstract/FREE Full Text

[474] Li, L.C.,

[475] Okino, S.T.,

[476] Zhao, H.,

[477] Pookot, D.,

[478] Place, R.F.,

[479] Urakami, S.,

[480] Enokida, H.,

[481] Dahiya, R.

[482] ↵

Lindblad-Toh, K.,

Wade, C.M.,

Mikkelsen, T.S.,

Karlsson, E.K.,

Jaffe, D.B.,

Kamal, M.,

Clamp, M.,

Chang, J.L.,

Kulbokas, E.J.,

Zody, M.C.,

et al.

(2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438:803–819.

CrossRef Medline Google Scholar

[483] Lindblad-Toh, K.,

[484] Wade, C.M.,

[485] Mikkelsen, T.S.,

[486] Karlsson, E.K.,

[487] Jaffe, D.B.,

[488] Kamal, M.,

[489] Clamp, M.,

[490] Chang, J.L.,

[491] Kulbokas, E.J.,

[492] Zody, M.C.,

[493] et al.

[494] ↵

Lippman, Z.,

Gendrel, A.V.,

Black, M.,

Vaughn, M.W.,

Dedhia, N.,

McCombie, W.R.,

Lavine, K.,

Mittal, V.,

May, B.,

Kasschau, K.D.,

et al.

(2004) Role of transposable elements in heterochromatin and epigenetic control. Nature 430:471–476.

CrossRef Medline Google Scholar

[495] Lippman, Z.,

[496] Gendrel, A.V.,

[497] Black, M.,

[498] Vaughn, M.W.,

[499] Dedhia, N.,

[500] McCombie, W.R.,

[501] Lavine, K.,

Raising the estimate of functional human sequences

Abstract

Transcriptional output of the genome

Genome-wide estimates of function from conservation

Uncertainty in the estimates of selection

Evidence for functional exaptation of transposon-derived sequences

Different rates of evolution of functional sequences

How much of the genome might be functional?

Acknowledgments

Footnotes

References

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

Related Content

Share

Preprint Server

Navigate This Article

Current Issue

In This Issue

Raising the estimate of functional human sequences

Abstract

Transcriptional output of the genome

Genome-wide estimates of function from conservation

Uncertainty in the estimates of selection

Evidence for functional exaptation of transposon-derived sequences

Different rates of evolution of functional sequences

How much of the genome might be functional?

Acknowledgments

Footnotes

References

Related Articles

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

Related Content

Share

Preprint Server

Navigate This Article

Current Issue

In This Issue