An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 5.
Figure 5.

Protein evidence for single amino acid variations (SAAVs). (A) Genomic region encoding an ABC transporter (BH_RS05910). RefSeq and Ensembl annotate it as a pseudogene, Genoscope as a fragmented pseudogene, while Prodigal and ChemGenome predict two CDSs. The reference genome (below gray virtual genome bar; NCBI RefSeq track) differs from the MQB277 assembly (MQB277 track above the virtual genome) by an insertion of 81 bp and a 1-bp deletion (red boxes); the 1-bp deletion causes a frameshift, evidenced by the lack of protein expression downstream from it (spectral count below the virtual genome; scaled from 0 to 800) and by transcriptomic data (reads mapped to the reference genome all support the insertion; lower panel). In contrast, the protein encoded by MQB277_12040 in the assembly is expressed over almost its entire length (class 1a peptides; one peptide identified by seven PSMs spans the frameshift region), also supported by transcriptomic reads mapping without any mismatch (Supplemental Fig. S7). (B) Evidence for a SNV causing a nonsynonymous SAAV in the CDS of transcription elongation factor GreA. Four peptides (two, four, eight, 39 PSMs) confirming this SAAV (glycine in reference to glutamic acid in our assembly) are mapped to this position in MQB277.

This Article

  1. Genome Res. 27: 2083-2095

Preprint Server