Hash functions in nucleotide sequence analysis

Ke Chen; Xiang Li; Qian Shi; Mingfu Shao; Paul Medvedev

doi:10.1101/gr.281453.125

Review

Hash functions in nucleotide sequence analysis

- ¹Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- ²Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- ³Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- 4 These authors contributed equally to this work.

Published May 4, 2026. https://doi.org/10.1101/gr.281453.125

Download PDF Cite Article Permissions

Current Issue:

June 2026, Vol. 36, No. 6

Focus view

Abstract

Randomness is a powerful tool in the design and analysis of algorithms and data structures for nucleotide sequence data. Nucleotide sequences are not themselves random but are often randomized using hash functions. Despite their widespread use in genomics, there is no comprehensive review of the types of hash functions used and their various applications. In this survey intended for bioinformatic methods developers, we divide hash functions into four categories: scattering hash functions, permutations, minimum perfect hash functions, and locality-sensitive hash functions. For each category, we provide examples of both general-use hash functions that have been applied in nucleotide sequence analysis and hash functions that have been designed specifically for nucleotide sequence analysis. We highlight their salient properties, commonalities, differences, and application areas.

Article contents

Article (Back to top)
- Abstract
- Notes

Announcement(s)

Review

Hash functions in nucleotide sequence analysis

Cite this article

Share

Current Issue:

Abstract

Article contents

Announcement(s)