Drosophila Genomic Sequence Annotation Using the BLOCKS+ Database

  1. Jorja G. Henikoff1 and
  2. Steven Henikoff1,2
  1. 2Howard Hughes Medical Institute, 1Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024 USA

Abstract

A simple and general homology-based method for gene finding was applied to the 2.9-Mb Drosophila melanogaster Adh region, the target sequence of the Genome Annotation Assessment Project (GASP). Each strand of the entire sequence was used as query of theBLOCKS+ database of conserved regions of proteins. This led to functional assignments for more than one-third of the genes and two-thirds of the transposons. Considering the enormous size of the query, the fact that only two false-positive matches were reported emphasizes the high selectivity of protein family-based methods for gene finding. We used the search results to improveBLOCKS+ by identifying compositionally biased blocks. Our results confirm that protein family databases can be used effectively in automated sequence annotation efforts.

Footnotes

  • 1 Corresponding author.

  • E-MAIL steveh{at}fhcrc.org; FAX (206) 667-5889.

    • Received February 9, 2000.
    • Accepted February 28, 2000.
| Table of Contents

Preprint Server