Nematode gene annotation by machine-learning-assisted proteotranscriptomics enables proteome-wide evolutionary analysis

  1. Falk Butter1,2
  1. 1Institute of Molecular Biology (IMB), 55128 Mainz, Germany
  1. 2 These authors contributed equally to this work.

  • 3 Present address: Wellcome Trust/Cancer Research UK Gurdon Institute and Department of Genetics, University of Cambridge, Cambridge CB2 1QN, UK

  • Corresponding authors: m.levin{at}imb.de, f.butter{at}imb.de
  • Abstract

    Nematodes encompass more than 24,000 described species, which were discovered in almost every ecological habitat, and make up >80% of metazoan taxonomic diversity in soils. The last common ancestor of nematodes is believed to date back to ∼650–750 million years, generating a large and phylogenetically diverse group to be explored. However, for most species high-quality gene annotations are incomprehensive or missing. Combining short-read RNA sequencing with mass spectrometry–based proteomics and machine-learning quality control in an approach called proteotranscriptomics, we improve gene annotations for nine genome-sequenced nematode species and provide new gene annotations for three additional species without genome assemblies. Emphasizing the sensitivity of our methodology, we provide evidence for two hitherto undescribed genes in the model organism Caenorhabditis elegans. Extensive phylogenetic systems analysis using this comprehensive proteome annotation provides new insights into evolutionary processes of this metazoan group.

    Footnotes

    • Received June 28, 2022.
    • Accepted November 18, 2022.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Preprint Server