Automated annotation of human centromeres with HORmon

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 2.
Figure 2.

HORmon pipeline. Given the nucleotide sequence Centromere and a consensus alpha satellite sequence Monomer, HORmon iteratively launches StringDecomposer (Dvorkina et al. 2020) to partition Centromere into monomer blocks. After each launch of StringDecomposer, HORmon launches CentromereArchitect (Dvorkina et al. 2021) to cluster similar monomer blocks into monomers, identify hybrid monomers (represented by a single hybrid D/E of monomers D and E), and transform Centromere into the monocentromere Centromere*. Afterward, HORmon uses the generated monocentromere to construct a monomer graph (red edges connect the hybrid monomer D/E with the rest of the monomer graph). To comply with the centromere evolution postulate, HORmon performs split/merge transformations and dehybridizations on the initial monomer set. The orange dotted undirected edge connects similar monomers A and B to indicate that they represent candidates for merging. The breakable monomer D is shown as a dotted vertex to indicate that it is a candidate for splitting into monomers D′ and D′′. The dehybridization substitutes the hybrid vertex D′/E by a single red edge that connects the prefix of D′ with the suffix of E. Split, merge, and dehybridization operations result in a new monomer set and transform Centromere into the monocentromere Centromere**. The black cycle in the monomer graph of Centromere* represents the HOR; the purple edge connecting monomers G and C is a low-frequency chord in this cycle. HORmon uses this HOR to generate the HOR decomposition of Centromere** into the canonical (cF, cC), partial (p(A + B)C, pFG, pCE), and auxiliary (the single block D′/E) HORs. cF and cC refer to traversing the (canonical) HOR starting from monomers F and C, respectively. p(A + B)C, pFG, and pCE refer to partial traversals of the HOR from monomer A + B to C, from F to G, and from C to E, respectively.

This Article

  1. Genome Res. 32: 1137-1151

Preprint Server