Multiplex Sequencing of 1.5 Mb of the Mycobacterium leprae Genome
- Douglas R. Smith1,2,
- Peter Richterich2,
- Marc Rubenfield2,
- Philip W. Rice2,
- Carol Butler2,
- Hong-Mei Lee2,
- Susan Kirst2,
- Kristin Gundersen2,
- Kari Abendschan2,
- Qinxue Xu2,
- Maria Chung2,
- Craig Deloughery2,
- Tyler Aldredge2,
- James Maher2,
- Ronald Lundstrom2,
- Craig Tulig2,
- Kathleen Falls2,
- Joan Imrich2,
- Dana Torrey2,
- Marcy Engelstein2,
- Gary Breton2,
- Deepika Madan2,
- Raymond Nietupski2,
- Bruce Seitz2,
- Steven Connelly2,
- Steven McDougall2,
- Hershel Safer2,
- Rene Gibson2,
- Lynn Doucette-Stamm2,
- Karin Eiglmeier5,
- Staffan Bergh5,
- Stewart T. Cole5,
- Keith Robison4,
- Laura Richterich4,
- Jason Johnson4,
- George M. Church1,3,4, and
- Jen-i Mao2
- 2Genome Therapeutics Corporation, Collaborative Research Division, Waltham, Massachusetts 02154; 3Howard Hughes Medical Institute and 4Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115; 5Unite de Genetique Moleculaire Bacterienne, Institut Pasteur, 75724 Paris CEDEX 15, France
Abstract
The nucleotide sequence of 1.5 Mb of genomic DNA fromMycobacterium leprae was determined using computer-assisted multiplex sequencing technology. This brings the 2.8-Mb M. leprae genome sequence to ∼66% completion. The sequences, derived from 43 recombinant cosmids, contain 1046 putative protein-coding genes, 44 repetitive regions, 3 rRNAs, and 15 tRNAs. The gene density of one per 1.4 kb is slightly lower than that ofMycoplasma (1.2 kb). Of the protein coding genes, 44% have significant matches to genes with well-defined functions. Comparison of 1157 M. leprae and 1564 Mycobacterium tuberculosisproteins shows a complex mosaic of homologous genomic blocks with up to 22 adjacent proteins in conserved map order. Matches to known enzymatic, antigenic, membrane, cell wall, cell division, multidrug resistance, and virulence proteins suggest therapeutic and vaccine targets. Unusual features of the M. leprae genome include large polyketide synthase (pks) operons, inteins, and highly fragmented pseudogenes.
[The sequence data described in this paper have been submitted to GenBank under accession nos. L78811–L78829,U00010–U00023, U15180–U15184, U15186, U15187, L01095, L01536, L04666, and L01263. On-line supplementary information for Table 1 is available at http://www.cshl.org/gr.]
Footnotes
-
↵1 Corresponding authors.
-
E-MAIL church{at}salt2.med.harvard.edu; smith{at}cric.com; FAX (617) 432-7663.
-
- Received February 13, 1997.
- Accepted June 10, 1997.
- Cold Spring Harbor Laboratory Press











