From First Base: The Sequence of the Tip of the X Chromosome of Drosophila melanogaster, a Comparison of Two Sequencing Strategies

  1. Panayiotis V. Benos1,15,
  2. Melanie K. Gatt2,11,
  3. Lee Murphy3,
  4. David Harris3,
  5. Bart Barrell3,
  6. Concepcion Ferraz4,
  7. Sophie Vidal4,
  8. Christine Brun4,
  9. Jacques Demaille4,
  10. Edouard Cadieu5,
  11. Stephane Dreano5,
  12. Stéphanie Gloux5,
  13. Valerie Lelaure5,
  14. Stephanie Mottier5,
  15. Francis Galibert5,
  16. Dana Borkova6,
  17. Belen Miñana6,
  18. Fotis C. Kafatos6,
  19. Slava Bolshakov6,7,
  20. Inga Sidén-Kiamos7,
  21. George Papagiannakis7,
  22. Lefteris Spanos7,
  23. Christos Louis7,8,
  24. Encarnación Madueño9,
  25. Beatriz de Pablos9,
  26. Juan Modolell9,
  27. Annette Peter10,
  28. Petra Schöttler10,
  29. Meike Werner10,
  30. Fotini Mourkioti10,
  31. Nicole Beinert10,
  32. Gordon Dowe10,
  33. Ulrich Schäfer10,
  34. Herbert Jäckle10,
  35. Alain Bucheton4,
  36. Debbie Callister11,
  37. Lorna Campbell11,
  38. Nadine S. Henderson11,
  39. Paul J. McMillan11,
  40. Cathy Salles11,
  41. Evelyn Tait11,
  42. Phillipe Valenti11,
  43. Robert D.C. Saunders11,12,
  44. Alain Billaud13,
  45. Lior Pachter14,
  46. David M. Glover2,11, and
  47. Michael Ashburner1,2,16
  1. 1EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK; 2Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK; 3Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK; 4Montpellier University Medical School, IGH-Institut de Génétique Humaine-CNRS, 34396 Montpellier Cedex 5, France; 5UPR 41, CNRS, Recombinaisons Génétiques, Faculte de Medecine, 35043 Rennes Cedex, France; 6European Molecular Biology Laboratory (EMBL), D-69117 Heidelberg, Germany; 7Institute of Molecular Biology and Biotechnology, FORTH, GR-71110 Heraklion, Greece; 8Department of Biology, University of Crete, 71409 Heraklion, Crete, Greece; 9Centro de Biologı́a Molecular Severo Ochoa, CSIC and Universidad Autónoma de Madrid, 28049 Madrid, Spain; 10Max-Planck-Institut für biophysikalische Chemie, Department of Molecular Developmental Biology, D-37070 Göttingen, Germany; 11Department of Anatomy and Physiology, CRC Cell Cycle Genetics Group, University of Dundee, Dundee, DD1 4HN, UK; 12Department of Biological Sciences, The Open University, Milton Keynes, MK7 6AA, UK; 13Fondation Jean Dausset-CEPH (Centre d'Etude du Polymorphisme Humain), 75010 Paris, France; 14Department of Mathematics, University of California at Berkeley, California 94720-3840, USA

Abstract

We present the sequence of a contiguous 2.63 Mb of DNA extending from the tip of the X chromosome ofDrosophila melanogaster. Within this sequence, we predict 277 protein coding genes, of which 94 had been sequenced already in the course of studying the biology of their gene products, and examples of 12 different transposable elements. We show that an interval between bands 3A2 and 3C2, believed in the 1970s to show a correlation between the number of bands on the polytene chromosomes and the 20 genes identified by conventional genetics, is predicted to contain 45 genes from its DNA sequence. We have determined the insertion sites ofP-elements from 111 mutant lines, about half of which are in a position likely to affect the expression of novel predicted genes, thus representing a resource for subsequent functional genomic analysis. We compare the European Drosophila Genome Project sequence with the corresponding part of the independently assembled and annotated Joint Sequence determined through “shotgun” sequencing. Discounting differences in the distribution of known transposable elements between the strains sequenced in the two projects, we detected three major sequence differences, two of which are probably explained by errors in assembly; the origin of the third major difference is unclear. In addition there are eight sequence gaps within the Joint Sequence. At least six of these eight gaps are likely to be sites of transposable elements; the other two are complex. Of the 275 genes in common to both projects, 60% are identical within 1% of their predicted amino-acid sequence and 31% show minor differences such as in choice of translation initiation or termination codons; the remaining 9% show major differences in interpretation.

[All of the sequences analyzed in this paper have been deposited in the EMBL-Bank database under the following accession nos.: AL009146,AL009147, AL009171, AL009188AL009196, AL021067, AL021086,AL021106AL021108, AL021726, AL021728, AL022017, AL022018, AL022139,AL023873, AL023874, AL023893, AL024453, AL024455AL024457, AL024485,AL030993, AL030994, AL031024AL031028, AL031128, AL031173, AL031366,AL031367, AL031581AL031583, AL031640, AL031765, AL031883, AL031884,AL034388, AL034544, AL035104, AL035105, AL035207, AL035245, AL035331,AL035632, AL049535, AL050231, AL050232, AL109630, AL121804, AL121806,AL132651, AL132792, AL132797, AL133503AL133506, AL138678, AL138971,AL138972, and Z98269. A single file (FASTA format) of the 2.6-Mb contig is available fromftp://ftp.ebi.ac.uk/pub/databases/edgp/contigs/contig_1.fa.]

Footnotes

  • 15 Present address: Department of Genetics, School of Medicine, Washington University, 4566 Scott Avenue,St. Louis, MO 63110 USA.

  • 16 Corresponding author.

  • E-MAIL m.ashburner{at}gen.cam.ac.uk; FAX 44-1223-333992.

  • Article and publication are at www.genome.org/cgi/doi/10.1101/gr.173801.

    • Received December 10, 2000.
    • Accepted February 16, 2001.
| Table of Contents

Preprint Server