Method

Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq

    • 1Institute for Integrative Systems Biology, Spanish National Research Council, Paterna 46980, Spain;
    • 2Department of Computer Science, Universitat de València, Valencia 46100, Spain;
    • 3Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research, 10315 Berlin, Germany;
    • 4Berlin Center for Genomics in Biodiversity Research, 14195 Berlin, Germany;
    • 5Department of Physiological Sciences, Center for Environmental and Human Toxicology, University of Florida, Gainesville, Florida 32611, USA
    • 6 These authors contributed equally to this work.
Published December 23, 2024. https://doi.org/10.1101/gr.279864.124
Download PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 6
Current Issue:

Abstract

While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or Pacific Biosciences (PacBio) Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4906 novel loci, represented by 5707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.

Loading
Loading
Loading
Back to top