Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease

  1. Alexis Battle2,12,13
  1. 1Department of Genetics, Stanford University, Stanford, California 94305, USA;
  2. 2Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
  3. 3Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA;
  4. 4Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA;
  5. 5Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida 33136, USA;
  6. 6Department of Pediatrics, Division of Medical Genetics, Stanford University School of Medicine, Stanford, California 94304, USA;
  7. 7Department of Pediatrics, Stanford University School of Medicine, Stanford, California 94304, USA;
  8. 8Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California 94305, USA;
  9. 9Department of Pathology, Stanford University, Stanford, California 94305, USA;
  10. 10Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA;
  11. 11GREGoR Stanford Site, Stanford University, Stanford, California 94305, USA;
  12. 12Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
  13. 13Department of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21218, USA
  1. 14 These authors contributed equally to this work.

  • Corresponding authors: smontgom{at}stanford.edu, mschatz{at}cs.jhu.edu, wheelerm{at}stanford.edu, ajbattle{at}jhu.edu
  • Abstract

    Rare structural variants (SVs)—insertions, deletions, and complex rearrangements—can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore Technologies long-read genomes of 68 individuals from the undiagnosed disease network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4× increase from short reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably, these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that do not incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression toward improving the prioritization of functional SVs and TREs in rare disease patients.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279323.124.

    • Freely available online through the Genome Research Open Access option.

    • Received March 15, 2024.
    • Accepted January 6, 2025.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Articles citing this article

    | Table of Contents
    OPEN ACCESS ARTICLE

    This Article

    1. Genome Res. 35: 914-928 © 2025 Jensen et al.; Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server