Improving quartet graph construction for scalable and accurate species tree estimation from gene trees

  1. Erin K. Molloy1,2
  1. 1Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA;
  2. 2University of Maryland Institute for Advanced Computer Studies, College Park, Maryland 20742, USA
  • Corresponding author: ekmolloy{at}umd.edu
  • Abstract

    Summary methods are widely used to estimate species trees from genome-scale data. However, they can fail to produce accurate species trees when the input gene trees are highly discordant because of estimation error and biological processes, such as incomplete lineage sorting. Here, we introduce TREE-QMC, a new summary method that offers accuracy and scalability under these challenging scenarios. TREE-QMC builds upon weighted Quartet Max Cut, which takes weighted quartets as input and then constructs a species tree in a divide-and-conquer fashion, at each step forming a graph and seeking its max cut. The wQMC method has been successfully leveraged in the context of species tree estimation by weighting quartets by their frequencies in the gene trees; we improve upon this approach in two ways. First, we address accuracy by normalizing the quartet weights to account for “artificial taxa” introduced during the divide phase so subproblem solutions can be combined during the conquer phase. Second, we address scalability by introducing an algorithm to construct the graph directly from the gene trees; this gives TREE-QMC a time complexity of Formula, where n is the number of species and k is the number of gene trees, assuming the subproblem decomposition is perfectly balanced. These contributions enable TREE-QMC to be highly competitive in terms of species tree accuracy and empirical runtime with the leading quartet-based methods, even outperforming them on some model conditions explored in our simulation study. We also present the application of these methods to an avian phylogenomics data set.

    Footnotes

    • Received December 23, 2022.
    • Accepted May 4, 2023.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    Preprint Server