Massive Sequence Comparisons as a Help in Annotating Genomic Sequences

  1. Alexandra Louis1,2,4,
  2. Emmanuelle Ollivier1,
  3. Jean-Christophe Aude3, and
  4. Jean-Loup Risler1
  1. 1Laboratoire Génome et Informatique, Université de Versailles, 78035 Versailles Cedex, France; 2Laboratoire de Biologie Cellulaire, Institut National de Recherche Agronomique, 78026 Versailles Cedex, France; 3Centre d'Etudes Atomiques, Saclay, 91191 Gif-sur-Yvette Cedex, France

Abstract

An all-by-all comparison of all the publicly available protein sequences from plants has been performed, followed by a clusterization process. Within each of the 1064 resulting clusters—containing sequences that are orthologous as well as paralogous—the sequences have been submitted to a pyramidal classification and their domains delineated by an automated procedure à la PRODOM. This process provides a means for easily checking for any apparent inconsistency in a cluster, for example, whether one sequence is shorter or longer than the others, one domain is missing, etc. In such cases, the alignment of the DNA sequence of the gene with that of a close homologous protein often reveals (in 10% of the clusters) probable sequencing errors (leading to frameshifts) or probable wrong intron/exon predictions. The composition of the clusters, their pyramidal classifications, and domain decomposition, as well as our comments when appropriate, are available fromhttp://chlora.infobiogen.fr:1234/PHYTOPROT.

Footnotes

  • 4 Corresponding author.

  • E-MAIL louis{at}genetique.uvsq.fr; FAX 33 01 39254569.

  • Article published on-line before print: Genome Res.,10.1101/gr. 177601.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.177601.

    • Received January 5, 2001.
    • Accepted March 22, 2001.

Articles citing this article

| Table of Contents

Preprint Server