Gaps and complex structurally variant loci in phased genome assemblies

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

Comparison and evaluation of phased assemblies. (A) Assembly metrics evaluated in this study. (i) Contig alignment ends are defined as terminal contig alignments such that the total alignment size does not exceed the actual contig size by >5%. When this requirement is not met, multiple contig end alignments will be reported. (ii) Simple contig ends are defined as the first and last alignments of each contig to the reference (T2T-CHM13 v1.1) with at least 25 kbp aligned. (iii) Contig discontinuities are defined as alignment gaps between subsequent pieces of a single contig <1 Mbp. (iv) Detection of regions with coverage more than 1n as is expected for a haploid genome. (B) A cumulative contig size distribution colored by assembly technology. Each line represents a single haploid assembly (HGSVC-FLYE-CLR, n = 60; HGSVC-PEREG-CCS, n = 28; HGSVC-HIFIASM-CCS, n = 28; HPRC-HIFIASM-CCS, n = 94). Median total assembly length per assembly technology is highlighted as horizontal dotted lines. (C) Contig N50 values colored by assembly technology as in B. Each dot represents a single haploid assembly. Median N50 value per assembly technology is highlighted as horizontal dotted lines. (D) Track definition from top to bottom: Regions corresponding to known genomic disorders between 15q11.2–15q13.3. Below is the annotation of SDs in this region colored by sequence identity. Main track shows the visualization of contig alignments for 10 random samples from trio-free CLR assemblies (left) in comparison to trio-based HPRC assemblies (right). Contig alignments are colored by sample superpopulation (AFR, African; SAS, Southeast Asian; EAS, East Asian; EUR, European; AMR, American). White spaces between contig alignments represent boundaries between subsequent contig. Spaces filled with gray color represent unaligned portions of a single contig with respect to the reference (T2T-CHM13) and likely represent a structural variation (black arrowhead). The last track summarizes the extent of assembly gaps (between contigs; white space) and contig gaps (within contigs; gray rectangles) as coverage plot.

This Article

  1. Genome Res. 33: 496-510

Preprint Server