Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 5.
Figure 5.

Watershed-SV prioritizes symptom-relevant functional rare SVs from UDN LR-GS data set. (A) Swarmplot for number of gene-SV pairs prioritized per individuals in the UDN LR-GS data set under different set of combined filters. There are four filter categories: LR-GS-only filters, LR-GS + HPO filters, LR-GS + RNA filters, and LR-GS + RNA + HPO filters, in increasing level of stringency due to increasing types of filters jointly applied; red dot represents the mean number of gene-SV pairs across individuals, red horizontal line represents standard deviation; x-axis is in log2 scale; the bar plot on the right shows number of samples with significant prioritizations. (B) UpSet plot depicting number of gene-SV pairs prioritized by Watershed-SV (posterior > 0.6), CADD-SV (score > 10), and whether the SV is uniquely identified using LR-GS. (C,E) Case example 1, rare TREs shared by both siblings, and case example 2, rare compound heterozygous deletions in siblings. Lollipop plot shows which set of filter includes the candidate diagnostic gene-SV pair (triangle) and which does not (circle), height of the lollipop represents number of gene-SV pairs prioritized in log2 scale. (D) Panels depict the TR copy numbers of the siblings and unaffected parent with less-expanded allele. The TRE loci is in 5′ UTR of FAM193B. Both Watershed-SV and CADD-SV can prioritize this but not WGS-only baseline model. Both siblings have extremely high overexpression Z-scores. (F) Panels depict the compound heterozygous deletions phased onto both alleles for FAM177A1, causing LOF of gene and thereby underexpression outliers. Only Watershed-SV succeeded at prioritizing both variants.

This Article

  1. Genome Res. 35: 914-928

Preprint Server