Alignathon: a competitive assessment of whole-genome alignment methods

  1. Benedict Paten1,2
  1. 1Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA;
  2. 2Biomolecular Engineering Department, University of California Santa Cruz, Santa Cruz, California 95064, USA;
  3. 3School of Computer Science, McGill University, Montreal, QC H3A 0G4, Canada;
  4. 4Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16801, USA;
  5. 5European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom;
  6. 6Softberry Inc., Mount Kisco, New York 10549, USA;
  7. 7Department of Animal Biotechnology, Konkuk University, Seoul 143-701, Korea;
  8. 8Centre For Genomic Regulation (CRG), 08003 Barcelona, Spain;
  9. 9Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain;
  10. 10Westfalian Wilhelms University, Institute of Evolution and Biodiversity, 48149 Muenster, Germany;
  11. 11Institute of Human Genetics (IGH), UPR 1142, CNRS, Montpellier, France;
  12. 12Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA;
  13. 13Department of Computer Science, Northern Illinois University, DeKalb, Illinois 60115, USA;
  14. 14The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, United Kingdom;
  15. 15ithree Institute, University of Technology Sydney, NSW 2007, Australia;
  16. 16Department of Bioengineering and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Illinois 61801, USA;
  17. 17Department of Computer Science and the Donnelly Centre, University of Toronto, Toronto, ON M5S 3G4, Canada;
  18. 18Centre for Computational Medicine and the Genetics and Genome Biology Program, Hospital for Sick Children, Toronto, ON M5G 1X8, Canada;
  19. 19Lawrence Berkeley National Laboratory, Berkeley, California 94710, USA;
  20. 20Howard Hughes Medical Institute, Chevy Chase, Maryland 20815-6789, USA
  1. Corresponding author: benedict{at}soe.ucsc.edu

Abstract

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

Footnotes

  • [Supplemental material is available for this article.]

  • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.174920.114.

    Freely available online through the Genome Research Open Access option.

  • Received March 6, 2014.
  • Accepted September 30, 2014.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server