Combining DNA and protein alignments to improve genome annotation with LiftOn
- Kuan-Hao Chao1,2,
- Jakob M. Heinz3,
- Celine Hoh1,2,
- Alan Mao1,2,4,
- Alaina Shumate2,4,
- Mihaela Pertea1,2,4 and
- Steven L. Salzberg1,2,4,5
- 1Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- 2Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- 3Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA;
- 4Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- 5Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21211, USA
Abstract
As the number and variety of assembled genomes continue to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species. LiftOn's protein-centric algorithm considers both types of alignments, chooses optimal open reading frames, resolves overlapping gene loci, and finds additional gene copies when they exist. LiftOn can reliably transfer annotation between genomes representing members of the same species, as we demonstrate on human, mouse, honeybee, rice, and Arabidopsis thaliana. It can further map annotation effectively across species pairs as far apart as mouse and rat or Drosophila melanogaster and Drosophila erecta.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279620.124.
- Received May 24, 2024.
- Accepted December 19, 2024.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











