WebWise: Navigating the Human Genome Project

  1. Kim D. Pruitt1
  1. National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894 USA

This extract was created in the absence of an abstract.

The Human Genome Project has increased the rate of DNA sequence accumulation to the point where information management has become a formidable task. The central repositories for this avalanche of data, GenBank, EMBL (European Molecular Biology Laboratory), and DDBJ (DNA Data Bank of Japan), continue to accumulate DNA sequences at an unprecedented rate. For example, the total number of nucleotides stored in the GenBank database more than doubles every 18 months (Benson et al. 1997). The scientific community is clearly interested in supporting rapid access to high-quality DNA sequence, and, although this remains controversial (Adams and Venter 1996; Bentley 1996), in supporting release of “unfinished” DNA sequence data generated by the sequencing centers. (Unfinished DNA sequences generated from a cosmid, BAC, or P1 clone may include nucleotide errors and may consist of unordered or ordered contigs with one or more gaps.) Since the process of “finishing” a sequence (which includes resolving any ambiguous bases, contig assembly, gap closure, and annotation) proceeds at a much slower pace than the initial production of sequence, a considerable amount of unfinished sequence can accumulate at the sequencing centers.

Growing interest in timely dissemination of all the data, plus the perception that uneven access to the unfinished DNA sequences could confer an unfair advantage (or disadvantage) to research groups, resulted in increasing pressure on the sequencing centers and international databases to make …

| Table of Contents

Preprint Server