RESOURCE

Numerous Novel Annotations of the Human Genome Sequence Supported by a 5′-End–Enriched cDNA Collection

    • 1 Genoscope-Centre National de Séquençage and CNRS UMR-8030, 91000 Evry, France
Published February 12, 2004. Vol 14 Issue 3, pp. 463-471. https://doi.org/10.1101/gr.1481104
Download PDF Please log-in to or register for your personal account in order to access PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 4
Current Issue:

Abstract

A collection of 90,000 human cDNA clones generated to increase the fraction of “full-length” cDNAs available was analyzed by sequence alignment on the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by using this collection. Exon composition proposed for novel genes showed an average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this subset, CpG islands were observed at the 5′ end of 75%. In-frame stop codons upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that ∼380 gene models described in LocusLink could be extended at their 5′ end by at least one new exon. Finally, this cDNA resource provided an experimental support for annotations based exclusively on predictions, thus representing a resource substantially improving the human genome annotation.

Loading
Loading
Back to top