PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations

Bo Wen; Xiaojing Wang; Bing Zhang

doi:10.1101/gr.235028.118

PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations

¹Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030, USA;
²Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA

Corresponding author: bing.zhang{at}bcm.edu

Abstract

Massively parallel or second-generation sequencing-based genomic studies continuously identify new genomic alterations that may lead to novel protein sequences, which are attractive candidates for disease biomarkers and therapeutic targets after proteomic validation. Integrative proteogenomic methods have been developed to use mass spectrometry (MS)-based proteomics data for such validation. These methods replace the reference sequence database in proteomic database searching with a customized protein database that incorporates sample- or disease-specific sequences derived from DNA or RNA sequencing, thus enabling the identification of novel protein sequences. Although useful, this spectrum-centric approach requires a full evaluation of all possible spectrum-peptide pairs, which is time-consuming, error-prone, and difficult to apply. Here, we present PepQuery, a peptide-centric approach that focuses on only novel DNA or protein sequences of interest. PepQuery allows quick and easy proteomic validation of genomic alterations without customized database construction. We demonstrated the sensitivity and specificity of the approach in validating completely novel proteins, novel splice junctions, and single amino acid variants using simulations and experimental data. Notably, enabling unrestricted modification searching in PepQuery reduced false positives by up to 95%. We implemented PepQuery as both web-based and stand-alone applications. The web version provides direct access to more than half a billion MS/MS spectra from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and other cancer proteomic studies. The stand-alone version supports batch analysis and user-provided MS/MS data. PepQuery will increase the usage of proteogenomics beyond the proteomics community and will broaden the application of proteogenomics in personalized medicine.

Footnotes

[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.235028.118.

Received January 24, 2018.
Accepted December 28, 2018.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Articles citing this article

Mass Spectrometry-Based Profiling of Personalized Immunopeptidomes in Thai Renal Cell Carcinoma bioRxiv January 6, 2026 0: 2026.01.03.696780v1-2026.01.03.696780

Comparative Performance of Scribe and Database Search Engines in Metaproteomic Profiling of a Ground-Truth Microbiome Dataset bioRxiv October 15, 2025 0: 2025.05.15.654320v2-2025.05.15.654320

Alternate RNA decoding results in stable and abundant proteins in mammals bioRxiv October 13, 2025 0: 2024.08.26.609665v4-2024.08.26.609665

Detection of human unannotated microproteins by mass spectrometry-based proteomics: a community assessment bioRxiv July 29, 2025 0: 2025.02.19.639069v3-2025.02.19.639069

PepCentric Enables Fast Repository-Scale Proteogenomics Searches bioRxiv March 3, 2025 0: 2025.02.24.639867v1-2025.02.24.639867

Encoded and non-genetic alternative protein variants expand human functional proteome bioRxiv February 19, 2025 0: 2025.02.17.638604v1-2025.02.17.638604

Targeting a shared neoepitope derived from non-canonical translation of c-MYC oncogene in cancer cells bioRxiv May 30, 2024 0: 2024.05.23.595486v1-2024.05.23.595486

Proteogenomics analysis of human tissues using pangenomes bioRxiv May 30, 2024 0: 2024.05.24.595489v1-2024.05.24.595489

The Pseudogene RPS27AP5 Reveals Novel Ubiquitin and Ribosomal Protein Variants Involved in Specialised Ribosomal Functions bioRxiv February 12, 2024 0: 2024.02.05.578897v1-2024.02.05.578897

A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease bioRxiv December 21, 2023 0: 2023.11.21.568121v2-2023.11.21.568121

Current perspectives on mass spectrometry-based immunopeptidomics: the computational angle to tumor antigen discovery J Immunother Cancer October 29, 2023 11: e007073

The Breast Cancer Proteome and Precision Oncology Cold Spring Harb Perspect Med October 1, 2023 13: a041323

Finding Haplotypic Signatures in Proteins bioRxiv March 25, 2023 0: 2022.11.21.517096v2-2022.11.21.517096

Metaproteomic analysis of nasopharyngeal swab samples to identify microbial peptides and potential co-infection status in COVID-19 patients bioRxiv February 3, 2023 0: 2023.01.31.525328v1-2023.01.31.525328

Functional characterization of a PHF8 processed pseudogene in the mouse genome bioRxiv October 18, 2022 0: 2022.10.16.512440v1-2022.10.16.512440

MTALTND4, a second protein coded by nd4 impacts mitochondrial bioenergetics bioRxiv April 30, 2022 0: 2022.04.28.489924v1-2022.04.28.489924

Newfound coding potential of transcripts unveils missing members of human protein communities bioRxiv March 13, 2021 0: 2020.12.02.406710v2-2020.12.02.406710

A rigorous evaluation of optimal peptide targets for MS-based clinical diagnostics of Coronavirus Disease 2019 (COVID-19) medRxiv March 3, 2021 0: 2021.02.09.21251427v2-2021.02.09.21251427

The Personalized Proteome: Comparing Proteogenomics and Open Variant Search Approaches for Single Amino Acid Variant Detection bioRxiv December 14, 2020 0: 2020.12.11.419523v1-2020.12.11.419523

A novel isoform of ACE2 is expressed in human nasal and bronchial respiratory epithelia and is upregulated in response to RNA respiratory virus infection bioRxiv August 3, 2020 0: 2020.07.31.230870v1-2020.07.31.230870

Interferons and viruses induce a novel primate-specific isoform dACE2 and not the SARS-CoV-2 receptor ACE2 bioRxiv July 22, 2020 0: 2020.07.19.210955v1-2020.07.19.210955

Cancer-specific associations of driver genes with immunotherapy outcome bioRxiv June 19, 2020 0: 2020.06.16.155895v1-2020.06.16.155895

PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations

Abstract

Footnotes

Articles citing this article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Share

Preprint Server

Current Issue

In This Issue