A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry

  1. Akhilesh Pandey1,15,17
  1. 1McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry, Johns Hopkins University, Baltimore, Maryland 21205, USA;
  2. 2Institute of Bioinformatics, International Tech Park, Bangalore 560066, India;
  3. 3School of Biotechnology, Amrita Vishwa Vidyapeetham University, Amritapuri 690525, India;
  4. 4Centre of Excellence in Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry 605014, India;
  5. 5Manipal University, Manipal 576104, India;
  6. 6Department of Neurochemistry, National Institute of Mental Health and Neurosciences, Bangalore 560006, India;
  7. 7Rajiv Gandhi University of Health Sciences (RGUHS), Bangalore 560041, Karnataka, India;
  8. 8National Institute of Malaria Research, Field Station, Goa 403001, India;
  9. 9World Health Organization, South-East Asia office, Mahatma Gandhi Marg, New Delhi 110002, India;
  10. 10Cell and Molecular Biology Department, Imperial College London, South Kensington Campus, London SW7 2AZ, United Kingdom;
  11. 11Thermo Fisher Scientific (Bremen) GmbH, 28199 Bremen, Germany;
  12. 12Department of Molecular Microbiology, Johns Hopkins Malaria Research Institute, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 21205, USA;
  13. 13Department of Natural Sciences, University of Maryland Eastern Shore, Princess Anne, Maryland 21853, USA;
  14. 14Department of Tropical Medicine, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana 70112, USA;
  15. 15Departments of Pathology and Oncology, Johns Hopkins University, Baltimore, Maryland 21205, USA
    1. 16 These authors contributed equally to this work.

    Abstract

    Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search–specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.

    Footnotes

    • 17 Corresponding authors.

      E-mail nkumar{at}tulane.edu.

      E-mail pandey{at}jhmi.edu.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.127951.111.

    • Received April 8, 2011.
    • Accepted July 1, 2011.
    | Table of Contents

    Preprint Server