Markup | Genome Research

Figure 4.

The unitig merging step on the unitigs from Figure 3. Each pair (ending, unitig index) is sorted by ending, and indexes of unitigs that share the same ending are joined in a tuple. Each such tuple is assigned to a bucket corresponding to one of its unsealed end point indices chosen at random (solid arrows in the figure). For the other end point index of the tuple (dashed arrow), we put a placeholder in its corresponding bucket (in gray). Then, inside each bucket, pairs sharing the same unitig index are joined to form larger tuples. If an ending cannot be joined and does not have a corresponding placeholder, then is it marked as sealed and is not selected anymore for bucket assignment. For example, in the first step, in $B_{i d_{1}}$ the pair (id₁, id₂) is sealed at id₁ because there is no placeholder for id₁; however, in $B_{i d_{2}}$ the pair (id₂, id₃) is not sealed at id₂ because id₂ has a placeholder. The steps are repeated until no more tuples can be joined. For the noncanonical case, we can merge the pairs if one extremity is at the end and the other one at the beginning of the pairs. For canonical k-mers, we also have to keep track of the direction of the k-mer before joining them; for more details, see the section “Construction correctness.”