Rose Hoberman; Joana Dias; Bing Ge; Eef Harmsen; Michael Mayhew; Dominique J. Verlaan; Tony Kwan; Ken Dewar; Mathieu Blanchette; Tomi Pastinen

A probabilistic approach for SNP discovery in high-throughput human resequencing data

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.

Graphical overview of the four steps in the prediction pipeline. (1) Sequencing: Target regions are amplified by LR-PCR; amplicons are sequenced using a 454 GS-FLX sequencer. A set of sequence reads is generated by the 454 GS-FLX base-caller. (2) Alignment: Reads are aligned to the reference sequence and combined into a multiple sequence alignment (MSA). (3) Feature extraction: Numerical features are computed from the MSA for each site in the target region. (4) Training: Given a training set of sites with known genotypes from the HapMap database, we train a classifier to identify heterozygous sites from sequencing data. This classifier is then applied to novel data sets to identify novel SNPs.

This Article

Published in Advance July 15, 2009, doi: 10.1101/gr.092072.109 Genome Res. 2009. 19: 1542-1552

A probabilistic approach for SNP discovery in high-throughput human resequencing data

This Article

Preprint Server

Current Issue

In This Issue