Assembling Puzzles from Preassembled Blocks

  1. Pavel A. Pevzner
  1. Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0114, USA

This extract was created in the absence of an abstract.

Assembling large jigsaw puzzles is difficult, and most of us haven't even seen a 10,000 piece puzzle on sale in a toy store. Such puzzles require an enormous dedication, and most children (not to mention adults) are not willing to put the time and effort into their assembly. Moreover, it is only feasible for multi-feature compositions like “Garden of Pleasures” by Hieronymus Bosch (one of the best-selling large puzzles) with hundreds of people and animals. When Celera assembled their first million-piece puzzle (Myers et al. 2000), the Drosophila melanogaster genome, the Public Human Genome Project did not have a program that would reliably assemble even thousand-piece puzzles without errors. Surprisingly enough, there is still no such program in the public domain today.

Not to worry: Kent and Haussler (2001) “saved” the Human Genome Project with their GigAssembler. GigAssembleris very different from the Celera assembler: It assembles a million-piece puzzle (genome) from thousands of preassembled blocks (BAC contigs). Each such preassembled block may be composed from thousands of the original pieces (reads). The idea is simple: If you see a blue eye in one preassembled block from the “Garden of Pleasures”, then you are likely to find one more blue eye in another preassembled block. These two blocks should go together and help in the puzzle assembly. There are plenty of “pairs of eyes” in the genome: paired plasmid ends, BAC end pairs, parts of mRNAs or ESTs, and others. The difficulty, however, is (once again!) in repeats: What if there are many blue-eyed people (or animals) in the puzzle? Another problem is that assembly errors in preassembled blocks lead to complications. Unless such incorrectly assembled blocks are broken into correct parts, …

| Table of Contents

Preprint Server