Detecting and Analyzing DNA Sequencing Errors: Toward a Higher Quality of the Bacillus subtilis Genome Sequence

  1. Claudine Médigue1,2,4,
  2. Matthias Rose3,
  3. Alain Viari2, and
  4. Antoine Danchin1
  1. 1Institut Pasteur REG, F-75724 Paris Cedex 15, France; 2Atelier de BioInformatique, Université Paris VI, 75005 Paris, France; 3Goethe-Universitae Frankfurt, Institut für Mikrobiologie, D-60439 Frankfurt/Main, Germany

Abstract

During the determination of a DNA sequence, the introduction of artifactual frameshifts and/or in-frame stop codons in putative genes can lead to misprediction of gene products. Detection of such errors with a method based on protein similarity matching is only possible when related sequences are available in databases. Here, we present a method to detect frameshift errors in DNA sequences that is based on the intrinsic properties of the coding sequences. It combines the results of two analyses, the search for translational initiation/termination sites and the prediction of coding regions. This method was used to screen the complete Bacillus subtilisgenome sequence and the regions flanking putative errors were resequenced for verification. This procedure allowed us to correct the sequence and to analyze in detail the nature of the errors. Interestingly, in several cases in-frame termination codons or frameshifts were not sequencing errors but confirmed to be present in the chromosome, indicating that the genes are either nonfunctional (pseudogenes) or subject to regulatory processes such as programmed translational frameshifts. The method can be used for checking the quality of the sequences produced by any prokaryotic genome sequencing project.

Footnotes

  • 4 Corresponding author.

  • E-MAIL claudine.medigue{at}snv.jussieu.fr; FAX 33 160 87 8301.

    • Received April 26, 1999.
    • Accepted September 1, 1999.
| Table of Contents

Preprint Server