Searching journal content for articles similar to Hsi-Yang Fritz et al. 21 (5): 734.

Displaying results 1-9 of 9
For checked items
  1. ...compression to the data blocks. Various advanced compression algorithms have been proposed for high-throughput DNA sequence data (Quip [Jones et al. 2012], Samcomp [Bonfield and Mahoney 2013], HUGO [Li et al. 2014], etc.). Among larger data sets (e.g., The 1000 Genomes Project Consortium...
  2. ...designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific...
  3. ...evidence for 0.13% of reads coming from nonhuman DNA (Tae et al. 2014). To expand this to the full set of samples, we downloaded 257,943 viral sequences from the CoreNucleotide division ofGenBank andused theKraken classifier (Wood and Salzberg 2014) to define a set of 102.6M virus-specific 31-mers (see...
  4. ...-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can...
  5. ...of the strobemers are discussed in detail in the Methods section. The name strobemers is inspired by strobe sequencing technology (an early Pacific Biosciences sequencing protocol), which would produce multiple subreads from a single contiguous fragment of DNA in which the subreads were separated by “dark...
  6. ...nearly half a petabyte of temporary storage and 1.05 million CPU hours on a new cluster (Catalyst) designed for data intensive computing (see Methods). Table 2 shows a count of distinct human labeled 20-mers added from each source of human genomic DNA: the reference assembly (LMAT-Ref), GenBank (LMAT...
  7. ...assembly using long-read data from third-generation sequencing is a viable strategy for overcoming reference bias and assembling through highly repetitive loci (Rhoads and Au 2015; Jain et al. 2018). However, the high-molecular-weight gDNA input requirement relative to second-generation sequencing (∼5000...
  8. .... Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res 21: 734–740. TheHumanMicrobiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486: 207–214. Illumina. 2015.HiSeq X series of sequencing systems. Vol...
    OPEN ACCESS ARTICLE
  9. ...-satellite repeats in a manner proportional to that observed in the initial read database, but the long-range ordering of repeats is inferred. In contrast to the remainder of the chromosome sequence, in which each underlying clone component represents the actual haplotype of its source DNA, the modeled sequence...
For checked items

Preprint Server