Dirk D. Dolle; Zhicheng Liu; Matthew Cotten; Jared T. Simpson; Zamin Iqbal; Richard Durbin; Shane A. McCarthy; Thomas M. Keane

Using reference-free compressed data structures to analyze sequencing reads from thousands of human genomes

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.

Sequencing reads from 2705 individuals (low-coverage whole-genome and exome sequencing) from 26 populations comprising a total of 922 billion reads (87.1 Tbp) used for the 1000GP population BWT. Reads were first error-corrected using a Cortex graph (Iqbal et al. 2012). The error-corrected reads were then trimmed to either 100 or 73 bp, unique sequences identified on the forward strand, quality values discarded, and the metadata stored in a separate database. This resulted in 4.9 Tbp consisting of 53 billion nonredundant reads.

This Article

Published in Advance December 16, 2016, doi: 10.1101/gr.211748.116 Genome Res. 2017. 27: 300-309

AbstractFree
Full TextFree
Full Text (PDF)
Supplemental Material

Using reference-free compressed data structures to analyze sequencing reads from thousands of human genomes

This Article

Preprint Server

Current Issue

In This Issue