The Drosophila Gene Collection: Identification of Putative Full-Length cDNAs for 70% of D. melanogaster Genes

Table 1.

cDNA Library and EST Characterization

Riken embryo Riken head Adult testis
Libraries
cDNA cloning vector pFLC1 pFLC1 pOTB7
PolyA presence 100% (n = 485) 92% (n = 352) 99% (n = 445)
Inverted inserts   0% (n = 454)  0% (n = 355) 0.7% (n = 706)
Average insert length 2.1 kb (n = 96) 1.6 kb (n = 96) 2.0 kb (n = 96)
Chimeric insert <1% (n = 488) 1.6% (n = 668) 2.8% (n = 313)
Initial gene discovery rate 10% 9% 23%
ESTs
Attempts 71807 67870 29664
Failed quality 10181 11731 6146
Contaminant 1372 1224 303
Total high quality 60254 54915 23215
Average high quality read length 484 472 528
  • Determined by the presence of a polyA tract in the 5′-end sequence.

  • Determined by PCR amplification using primers in the cloning vector.

  • Clones whose 5′ and 3′ reads aligned to different chromosomal arms or >300 kb apart using Sim4.

  • Originally determined by pairwise Blast using all previous ESTs.

  • Reads of <150 bp after vector and quality trimming.

  • Reads that were discarded because of significant hits to the Genbank GB.vector dataset.

  • See Methods for details.

  • EST, expressed sequence tag.

This Article

  1. Genome Res. 12: 1294-1300

Preprint Server