Adaptive seeds tame genomic sequence comparison

Szymon M Kielbasa; Raymond Wan; Kengo Sato; Paul Horton; Martin Frith

doi:10.1101/gr.113985.110

Adaptive seeds tame genomic sequence comparison

¹ Max Planck Institute for Molecular Genetics;
² National Institute of Advanced Industrial Science and Technology;
³ University of Tokyo

* Corresponding author; email: martin{at}cbrc.jp

Abstract

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billion-base DNA datasets. The difficulty is caused by the non-uniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g. BLAST), to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily non-uniform composition.

Received August 13, 2010.
Accepted December 13, 2010.

This manuscript is Open Access.

Adaptive seeds tame genomic sequence comparison

Abstract

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

Share

Preprint Server

Current Issue

In This Issue