Genome-scale coestimation of species and gene trees

  1. Vincent Daubin2
  1. 1 UC Berkeley;
  2. 2 Université de Lyon ; Université Lyon 1 ; CNRS ; INRIA ; UMR 5558, LBBE
  1. * Corresponding author; email: boussau{at}gmail.com

Abstract

Comparisons of gene trees and species trees are key to understanding major processes of genome evolution such as gene duplication and loss. Because current methods to reconstruct phylogenies fail to model the two-way dependency between gene trees and the species tree, they often misrepresent gene and species histories. We present a new probabilistic model to jointly infer rooted species and gene trees for dozens of genomes and thousands of gene families. We use simulations to show that this method accurately infers the species tree and gene trees, is robust to misspecification of the models of sequence and gene family evolution and provides a precise historic record of gene duplications and losses throughout genome evolution. We simultaneously reconstruct the history of mammalian species and their genes, based on 36 completely sequenced genomes, and use the reconstructed gene trees to infer the gene content and organization of ancestral mammalian genomes. We show that our method yields a more accurate picture of ancestral genomes than the trees available in the authoritative database Ensembl.

  • Received April 19, 2012.
  • Accepted October 22, 2012.

This manuscript is Open Access.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

Articles citing this article

OPEN ACCESS ARTICLE
ACCEPTED MANUSCRIPT

Preprint Server