RT Journal A1 Denoeud, France A1 Kapranov, Philipp A1 Ucla, Catherine A1 Frankish, Adam A1 Castelo, Robert A1 Drenkow, Jorg A1 Lagarde, Julien A1 Alioto, Tyler A1 Manzano, Caroline A1 Chrast, Jacqueline A1 Dike, Sujit A1 Wyss, Carine A1 Henrichsen, Charlotte N. A1 Holroyd, Nancy A1 Dickson, Mark C. A1 Taylor, Ruth A1 Hance, Zahra A1 Foissac, Sylvain A1 Myers, Richard M. A1 Rogers, Jane A1 Hubbard, Tim A1 Harrow, Jennifer A1 Guigó, Roderic A1 Gingeras, Thomas R. A1 Antonarakis, Stylianos E. A1 Reymond, Alexandre T1 Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions JF Genome Research JO Genome Research YR 2007 FD June 01 VO 17 IS 6 SP 746 OP 759 DO 10.1101/gr.5660607 UL http://genome.cshlp.org/content/17/6/746.abstract AB This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5′ rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5′ distal to the annotated 5′ terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be “noncoding,” ultimately relating to the identification of disease-related sequence alterations.