Assemblathon 1: A competitive assessment of de novo short read assembly methods
- Dent A. Earl1,
- Keith Bradnam2,
- John St. John3,
- Aaron Darling2,
- Dawei Lin2,
- Joseph Faas2,
- Hung On Ken Yu2,
- Buffalo Vince2,
- Daniel R. Zerbino3,
- Mark Diekhans3,
- Ngan Nguyen3,
- Pramila Nuwantha4,
- Ariyaratne Wing-Kin Sung4,
- Zemin Ning5,
- Matthias Haimel6,
- Jared T. Simpson5,
- Nuno A. Fronseca7,
- İnanç Birol8,
- T. Roderick Docking8,
- Isaac Y. Ho9,
- Daniel S Rokhsar10,
- Rayan Chikhi11,
- Dominique Lavenier11,
- Guillaume Chapuis11,
- Delphine Naquin12,
- Nicolas Maillet12,
- Michael C. Schatz13,
- David R. Kelly14,
- Adam M. Phillippy14,
- Sergey Koren14,
- Shiaw-Pyng Yang15,
- Wei Wu15,
- Wen-Chi Chou16,
- Anuj Srivastava16,
- Timothy I. Shaw16,
- J. Graham Ruby17,
- Peter Skewes-Cox17,
- Miguel Betegon17,
- Michelle T. Dimon17,
- Victor Solovyev18,
- Petr Kosarev18,
- Denis Vorobyev18,
- Ricardo Ramirez-Gonzalez19,
- Richard Leggett20,
- Dan MacLean20,
- Fangfang Xia21,
- Ruibang Luo22,
- Zhenyu L22,
- Yinlong Xie22,
- Binghang Liu22,
- Sante Gnerre23,
- Iain MacCallum23,
- Dariusz Przybylski23,
- Filipe J. Ribeiro23,
- Shuangye Yin23,
- Ted Sharpe23,
- Giles Hall23,
- Paul J. Kersey6,
- Richard Durbin24,
- Shaun D. Jackman25,
- Jarrod A. Chapman9,
- Xiaoqiu Huang26,
- Joseph L. DeRisi17,
- Mario Caccamo27,
- Yingrui Li22,
- David B. Jaffe28,
- Richard Green3,
- David Haussler3,
- Ian Korf2 and
- Benedict Paten3,29
- 1 UCSC;
- 2 UC Davis;
- 3 UC Santa Cruz;
- 4 Agency for Science, Singapore;
- 5 Wellcome Trust Sanger;
- 6 EMBL-EBI;
- 7 CRACS Portugal;
- 8 Genome Sciences Centre, BC Cancer Agency;
- 9 DOE JGI;
- 10 UC Berkeley;
- 11 Symbiose, IRISA;
- 12 CNRS/Symbiose, IRISA;
- 13 CSHL;
- 14 U MD;
- 15 Monsanto;
- 16 U of Georgia;
- 17 UCSF;
- 18 Softberry;
- 19 GAC, Norwich;
- 20 Sainsbury Lab;
- 21 U of Chicago;
- 22 BGI-Shenzen;
- 23 Broad;
- 24 Sanger;
- 25 BC Cancer Genome Sciences Centre;
- 26 Iowa State;
- 27 GAC, Sainsbury Lab & Wellcome Trust;
- 28 Broad Institute
- ↵* Corresponding author; email: benedict{at}soe.ucsc.edu
Abstract
Low cost short read sequencing technology has revolutionised genomics, though it is only just becoming practical for the high quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort teams were asked to assemble a simulated Illumina HiSeq dataset of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling and copy number were made. We establish that within this benchmark (1) it is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods.
- Received May 20, 2011.
- Accepted September 8, 2011.
- Copyright © 2011, Cold Spring Harbor Laboratory Press
This manuscript is Open Access.











