RT Journal A1 Kamath, Govinda M A1 Shomorony, Ilan A1 Xia, Fei A1 Courtade, Thomas A1 Tse, David N T1 HINGE: Long-read assembly achieves optimal repeat resolution JF Genome Research JO Genome Research YR 2017 FD March 20 DO 10.1101/gr.216465.116 SP gr.216465.116 UL http://genome.cshlp.org/content/early/2017/03/20/gr.216465.116.abstract AB Long-read sequencing technologies have potential to produce gold-standard de novo genome assemblies, but fully exploiting error-prone reads to resolve repeats remains a challenge. Aggressive approaches to repeat resolution often produce mis-assemblies, and conservative approaches lead to unnecessary fragmentation. We present HINGE, an assembler that achieves optimal repeat resolution by distinguishing repeats that can be resolved given the data from those that cannot. This is accomplished by adding "hinges" to reads for constructing an overlap graph where only unresolvable repeats are merged. As a result, HINGE combines the error resilience of overlap-based assemblers with repeat-resolution capabilities of de Bruijn graph assemblers. HINGE was evaluated on the long-read datasets from the NCTC project. Besides producing more finished assemblies than the manual pipeline of NCTC based on the HGAP assembler and Circlator, HINGE allows us to identify 40 datasets where unresolvable repeats prevent the reliable construction of a unique finished assembly. In these cases, HINGE outputs a visually interpretable assembly graph that encodes all possible finished assemblies consistent with the reads, while other approaches either fragment the assembly or resolve the ambiguity arbitrarily.