A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry

  1. Akhilesh Pandey1,11
  1. 1 Johns Hopkins University;
  2. 2 Institute of Bioinformatics;
  3. 3 National Institute of Mental Health and Neurosciences;
  4. 4 7National Institute of Malaria Research;
  5. 5 Imperial College London;
  6. 6 Thermo Fisher Scientific;
  7. 7 Johns Hopkins Malaria Research Institute;
  8. 8 Amrita Vishwa Vidyapeetham University;
  9. 9 University of Maryland Eastern Shore;
  10. 10 Tulane University School of Public Health and Tropical Medicine
  1. * Corresponding author; email: pandey{at}jhmi.edu

Abstract

Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort and many of the ~13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of Anopheles gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6,000 gene annotations including 80 novel gene structures and ~500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 of selected genes. Our proteogenomic analysis led to the identification of 2,682 genome search specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the Anopheles gambiae genome using high accuracy mass spectrometry data as a complementary technology for genome annotation.

  • Received June 18, 2011.
  • Accepted July 1, 2011.
ACCEPTED MANUSCRIPT

Preprint Server