MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome

  1. Terrance P Snutch1,3
  1. 1 University of British Columbia;
  2. 2 University of California Santa Cruz
  • * Corresponding author; email: snutch{at}msl.ubc.ca
  • Abstract

    Advances in long read single molecule sequencing have opened new possibilities for 'benchtop' whole genome sequencing. The Oxford Nanopore Technologies MinION is a portable device that uses nanopore technology that can directly sequence DNA molecules. MinION single molecule long sequence reads are well suited for de novo assembly of complex genomes as they facilitate the construction of highly contiguous physical genome maps obviating the need for labor-intensive physical genome mapping. Long sequence reads can also be used to delineate complex chromosomal rearrangements, such as those that occur in tumour cells, that can confound analysis using short reads. Here, we assessed MinION long read-derived sequences for feasibility concerning: 1) the de novo assembly of a large complex genome and 2) the elucidation of complex rearrangements. The genomes of two Caenorhabditis elegans strains, a wild type strain and a strain containing two complex rearrangements were sequenced with MinION. Up to 42-fold coverage was obtained from a single flowcell and the best pooled data assembly produced a highly contiguous wild type C. elegans genome containing 48 contigs (N50 contig length = 3.99 Mb) covering >99% of the 100,286,401 base reference genome. Further, the MinION-derived genome assembly expanded the C. elegans reference genome by >2Mb due to a more accurate determination of repetitive sequence elements, and assembled the complete genomes of two co-extracted bacteria. MinION long read sequence data also facilitated the elucidation of complex rearrangements in a mutagenized strain. The sequence accuracy of the MinION long read contigs (~98%) was improved using Illumina-derived sequence data to polish the final genome assembly to 99.8% nucleotide accuracy when compared to the reference assembly.

    • Received January 30, 2017.
    • Accepted December 19, 2017.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    Preprint Server