RT Journal A1 Jiao, Wen-Biao A1 Accinelli, Gonzalo Garcia A1 Hartwig, Benjamin A1 Kiefer, Christiane A1 Baker, David A1 Severing, Edouard A1 Willing, Eva-Maria A1 Piednoel, Mathieu A1 Woetzel, Stefan A1 Madrid-Herrero, Eva A1 Huettel, Bruno A1 Hümann, Ulrike A1 Reinhard, Richard A1 Koch, Marcus A. A1 Swan, Daniel A1 Clavijo, Bernardo A1 Coupland, George A1 Schneeberger, Korbinian T1 Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data JF Genome Research JO Genome Research YR 2017 FD May 01 VO 27 IS 5 SP 778 OP 786 DO 10.1101/gr.213652.116 UL http://genome.cshlp.org/content/27/5/778.abstract AB Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes; however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated Pacific Biosciences (PacBio) long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromosome conformation capture data for genome scaffolding. Despite their technical differences, optical mapping and chromosome conformation capture performed similarly and doubled N50 values. After improving both integration methods, assembly contiguity reached chromosome-arm-levels. We rigorously assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map information. This showed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome. Most, but not all, of these misjoints were removed during the integration of the optical mapping and chromosome conformation capture data. Even though none of the centromeres were fully assembled, the scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions, which are not present in gold standard reference sequences.