RT Journal A1 Porcel, Betina M. A1 Delfour, Olivier A1 Castelli, Vanina A1 De Berardinis, Veronique A1 Friedlander, Lucie A1 Cruaud, Corinne A1 Ureta-Vidal, Abel A1 Scarpelli, Claude A1 Wincker, Patrick A1 Schächter, Vincent A1 Saurin, William A1 Gyapay, Gabor A1 Salanoubat, Marcel A1 Weissenbach, Jean T1 Numerous Novel Annotations of the Human Genome Sequence Supported by a 5′-End–Enriched cDNA Collection JF Genome Research JO Genome Research YR 2004 FD March 01 VO 14 IS 3 SP 463 OP 471 DO 10.1101/gr.1481104 UL http://genome.cshlp.org/content/14/3/463.abstract AB A collection of 90,000 human cDNA clones generated to increase the fraction of “full-length” cDNAs available was analyzed by sequence alignment on the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by using this collection. Exon composition proposed for novel genes showed an average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this subset, CpG islands were observed at the 5′ end of 75%. In-frame stop codons upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that ∼380 gene models described in LocusLink could be extended at their 5′ end by at least one new exon. Finally, this cDNA resource provided an experimental support for annotations based exclusively on predictions, thus representing a resource substantially improving the human genome annotation.