A simple method for finding related sequences by adding probabilities of alternative alignments

Martin C. Frith

doi:10.1101/gr.279464.124

A simple method for finding related sequences by adding probabilities of alternative alignments

Martin C. Frith

Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan; Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8568, Japan; Computational Bio Big Data Open Innovation Laboratory, AIST, Tokyo 169-8555, Japan

Corresponding author: mcfrith{at}edu.k.u-tokyo.ac.jp

Abstract

The main way of analyzing genetic sequences is by finding sequence regions that are related to each other. There are many methods to do that, usually based on this idea: Find an alignment of two sequence regions, which would be unlikely to exist between unrelated sequences. Unfortunately, it is hard to tell if an alignment is likely to exist by chance. Also, the precise alignment of related regions is uncertain. One alignment does not hold all evidence that they are related. We should consider alternative alignments too. This is rarely done, because we lack a simple and fast method that fits easily into practical sequence-search software. Described here is the simplest-conceivable change to standard sequence alignment, which sums probabilities of alternative alignments and makes it easier to tell if a similarity is likely to occur by chance. This approach is better than standard alignment at finding distant relationships, at least in a few tests. It can be used in practical sequence-search software, with minimal increase in implementation difficulty or run time. It generalizes to different kinds of alignment, for example, DNA-versus-protein with frameshifts. Thus, it can widely contribute to finding subtle relationships between sequences.

Footnotes

[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279464.124.

Received April 13, 2024.
Accepted August 14, 2024.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.