SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 6.
Figure 6.

Description of the core assembly algorithm. (A) Pseudocode overview of the steps during assembly of a single contig. The parameter omin controls the stringency of the algorithm, and r denotes the read length. (B) Illustration of the elongation step. Contig C is to be elongated to the right. Read R is a candidate for elongation found in the data set of reads, because its prefix (gray) matches the end of C perfectly. The suffix of read R (white) is the potential extension E for contig C. The length of the check region M is the sum of read length r, and the length of the extension E. Substrings of M and its reverse complement are used to search for matching read prefixes in the data set. Only if all of these reads match M exactly is C extended by E.

This Article

  1. Genome Res. 17: 1697-1706

Preprint Server