Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 1.
Figure 1.

Undiagnosed patient cohort description and pipeline overview. Cohort description: (A) Patients were recruited from the UDN for a long-read sequencing (LR-GS) study. These included 57 affected individuals and 11 unaffected family members from a wide range of primary symptom categories, including Neurology, musculoskeletal, and cardiology. Patients had previous short-read genetic testing with Illumina that was negative or inconclusive. (B) Long-read Pipeline Overview: individuals were sequenced on R9.4 flowcells on the ONT PromethION. Consensus SVs were called by merging SVs across individual callers and keeping those that showed multialgorithm support. A population merge of the UDN genomes together with the Stanford ADRC population reference of 579 nanopore genomes, allowed ascertainment of robust allele frequencies for SVs. Rare SVs were filtered and intersected with overlapping genome annotations to input into Watershed. Vamos was used on a catalog of polymorphic tandem repeats to genotype tandem repeat copy numbers. A mean neighbor distance-based outlier calling method was used to define extreme repeat expansions. (C) RNA sequencing expression outlier pipeline: transcriptome data from the UDN was processed by quantifying expression, combining with tissue-matched controls from GTEx, normalizing for library size and composition bias, and correcting for batch effects and hidden factors. Expression outliers of the normalized data were input into Watershed. (D) Watershed-SV integrates signals from rare SVs and overlapping genome annotations to predict variants with large functional effects. High-scoring watershed variants are prioritized and curated per patient for disease relevance.

This Article

  1. Genome Res. 35: 914-928

Preprint Server