Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq

  1. Ana Conesa1
  1. 1Institute for Integrative Systems Biology, Spanish National Research Council, Paterna 46980, Spain;
  2. 2Department of Computer Science, Universitat de València, Valencia 46100, Spain;
  3. 3Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research, 10315 Berlin, Germany;
  4. 4Berlin Center for Genomics in Biodiversity Research, 14195 Berlin, Germany;
  5. 5Department of Physiological Sciences, Center for Environmental and Human Toxicology, University of Florida, Gainesville, Florida 32611, USA
  1. 6 These authors contributed equally to this work.

  • Corresponding author: ana.conesa{at}csic.es
  • Abstract

    While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or Pacific Biosciences (PacBio) Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4906 novel loci, represented by 5707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279864.124.

    • Freely available online through the Genome Research Open Access option.

    • Received July 31, 2024.
    • Accepted December 12, 2024.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    Articles citing this article

    OPEN ACCESS ARTICLE

    Preprint Server