Unraveling the palindromic and nonpalindromic motifs of retroviral integration site sequences by statistical mixture models

  1. Jiri Hejnar1,3
  1. 1 Institute of Molecular Genetics of the Czech Academy of Sciences;
  2. 2 Institute of Information Theory and Automation of the CAS
  • * Corresponding author; email: hejnar{at}img.cas.cz
  • Abstract

    A weak palindromic nucleotide motif is the hallmark of retroviral integration site alignments. Given that the majority of target sequences are not palindromic, the current model explains the symmetry by an overlap of the nonpalindromic motif present on one of the half-site of the sequences. Here, we show that the implementation of multicomponent mixture models allows for different interpretations consistent with the existence of both palindromic and nonpalindromic submotifs in the sets of integration site sequences. We further demonstrate that the weak palindromic motifs result from freely-combined site-specific submotifs restricted to only a few positions proximal to the site of integration. The submotifs are formed by either palindrome-forming nucleotide preference or nucleotide exclusion. Using the mixture models, we also identified HIV-1-favored palindromic sequences in Alu repeats serving as local hotspots for integration. The application of the novel statistical approach provides deeper insight into the selection of retroviral integration sites and may prove to be a valuable tool in the analysis of any type of DNA motifs.

    • Received January 13, 2023.
    • Accepted July 12, 2023.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International license), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    This Article

    1. Genome Res. gr.277694.123 Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server