RT Journal
A1 Ning, Zemin
A1 Cox, Anthony J.
A1 Mullikin, James C.
T1 SSAHA: A Fast Search Method for Large DNA Databases
JF Genome Research 
JO Genome Research 
YR 2001 
FD October 01 
VO 11 
IS 10 
SP 1725 
OP 1729 
DO 10.1101/gr.194201 
UL http://genome.cshlp.org/content/11/10/1725.abstract 
AB We describe an algorithm, SSAHA (SequenceSearch and Alignment by HashingAlgorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples ofk contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the “hits” for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHAalgorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects.