Alignathon: A competitive assessment of whole genome alignment methods
- Dent Earl1,
- Ngan K Nguyen1,
- Glenn Hickey1,
- Robert S. Harris2,
- Stephen Fitzgerald3,
- Kathryn Beal3,
- Igor Seledtsov4,
- Vladimir Molodtsov4,
- Brian Raney1,
- Hiram Clawson1,
- Jaebum Kim5,
- Carsten Kemena6,
- Jia-Ming Chang6,
- Ionas Erb6,
- Alexander Poliakov7,
- Minmei Hou8,
- Javier Herrero3,
- Victor Solovyev4,
- Aaron E. Darling9,
- Jian Ma10,
- Cedric Notredame6,
- Michael Brudno11,
- Inna Dubchak7,
- David Haussler1 and
- Benedict Paten1,12
- 1 University of California, Santa Cruz;
- 2 The Pennsylvania State University;
- 3 European Bioinformatics Institute;
- 4 Softberry Inc.;
- 5 Konkuk University;
- 6 Centre For Genomic Regulation;
- 7 Joint Genome Institute;
- 8 Northern Illinois University;
- 9 University of Technology Sydney;
- 10 University of Illinois;
- 11 University of Toronto
- ↵* Corresponding author; email: benedict{at}soe.ucsc.edu
Abstract
Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark datasets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole genome alignment (WGA). Using the same model as the successful Assemblathon competitions we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three datasets were used; two were simulated and based on primate and mammalian phylogenies and one was comprised of 20 real fly genomes. In total 35 submissions were assessed, submitted by ten teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable difference in the alignment quality of differently annotated regions and found few tools aligned the duplications analysed. We found many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all datasets, submissions and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.
- Received March 6, 2014.
- Accepted September 30, 2014.
- Published by Cold Spring Harbor Laboratory Press
This manuscript is Open Access.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.











