Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 3.
Figure 3.

Pseudodiploid human annotation metrics. (A) The number and fraction of genes comparatively annotated from GENCODE V27 in each assembly. GENCODE biotypes are simplified into protein coding, lncRNA, ncRNA, pseudogene, and other. Other includes processed transcripts, nonsense-mediated decay, and immune-related genes. (B) Frame-shifting insertions, deletions, and multiple of three indels that do not shift frame are reported for each assembly. Consistent with the great ape genomes, there is a systematic overrepresentation of coding deletions in Falcon assemblies, despite these assemblies coming from haploid cell lines. 10x Genomics Supernova assemblies also exhibit similar properties. (C) Split gene analysis reports how often paralog-resolved transcript projections end up on different contigs, which can measure assembly gene-level contiguity. PacBio assemblies, especially CHM1, are the most contiguous.

This Article

  1. Genome Res. 28: 1029-1038

Preprint Server