Review

Hash functions in nucleotide sequence analysis

    • 1Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
    • 2Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
    • 3Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
    • 4 These authors contributed equally to this work.
Download PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 5
Current Issue:

Abstract

Randomness is a powerful tool in the design and analysis of algorithms and data structures for nucleotide sequence data. Nucleotide sequences are not themselves random but are often randomized using hash functions. Despite their widespread use in genomics, there is no comprehensive review of the types of hash functions used and their various applications. In this survey intended for bioinformatic methods developers, we divide hash functions into four categories: scattering hash functions, permutations, minimum perfect hash functions, and locality-sensitive hash functions. For each category, we provide examples of both general-use hash functions that have been applied in nucleotide sequence analysis and hash functions that have been designed specifically for nucleotide sequence analysis. We highlight their salient properties, commonalities, differences, and application areas.

Loading
Loading
Back to top