Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes
- George W Armstrong1,
- Kalen Cantrell1,
- Shi Huang1,
- Daniel McDonald1,
- Niina Haiminen2,
- Anna Paola Carrieri3,
- Qiyun Zhu4,
- Antonio Gonzalez1,
- Imran McGrath1,
- Kristen Beck5,
- Daniel Hakim1,
- Aki S Havulinna6,
- Guillaume Méric7,
- Teemu Niiranen6,
- Leo Lahti8,
- Veikko Salomaa6,
- Mohit Jain1,
- Michael Inouye9,
- Austin D Swafford1,
- Ho-Cheol Kim5,
- Laxmi Parida2,
- Yoshiki Vázquez-Baeza1 and
- Rob Knight1,10
Abstract
The number of publicly available microbiome samples is continually growing. As dataset size increases, bottlenecks arise in standard analytical pipelines. Faith’s phylogenetic diversity is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's Phylogenetic Diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.
- Received May 18, 2021.
- Accepted September 1, 2021.
- Published by Cold Spring Harbor Laboratory Press
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.











