Leveraging the T2T assembly to resolve rare and pathogenic inversions in reference genome gaps

  1. Anna Lindstrand1,3
  1. 1Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden;
  2. 2Science for Life Laboratory, Karolinska Insitutet, 171 65 Solna, Sweden;
  3. 3Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden;
  4. 4Pacific Northwest Research Institute, Seattle, Washington 98122, USA;
  5. 5Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
  6. 6Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
  7. 7Center for Precision Health, McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas 77030, USA;
  8. 8Texas Children's Hospital, Houston, Texas 77030, USA;
  9. 9Cain Pediatric Neurology Research Laboratories, Jan and Dan Duncan Neurological Research Institute, Houston, Texas 77030, USA;
  10. 10Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, Texas 77030, USA;
  11. 11Department of Neuroscience, Baylor College of Medicine, Houston, Texas 77030, USA;
  12. 12McNair Medical Institute, The Robert and Janice McNair Foundation, Houston, Texas 77024, USA;
  13. 13Baylor Genetics Laboratory, Baylor College of Medicine, Houston, Texas 77021, USA;
  14. 14Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden;
  15. 15Department of Computer Science, Rice University, Houston, Texas 77251, USA;
  16. 16Department of Laboratory Medicine, University of Gothenburg, 413 45 Gothenburg, Sweden;
  17. 17Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, 413 45 Gothenburg, Sweden
  • Corresponding authors: jesper.eisfeldt{at}scilifelab.se, anna.lindstrand{at}ki.se
  • Abstract

    Chromosomal inversions (INVs) are particularly challenging to detect due to their copy-number neutral state and association with repetitive regions. Inversions represent about 1/20 of all balanced structural chromosome aberrations and can lead to disease by gene disruption or altering regulatory regions of dosage-sensitive genes in cis. Short-read genome sequencing (srGS) can only resolve ∼70% of cytogenetically visible inversions referred to clinical diagnostic laboratories, likely due to breakpoints in repetitive regions. Here, we study 12 inversions by long-read genome sequencing (lrGS) (n = 9) or srGS (n = 3) and resolve nine of them. In four cases, the inversion breakpoint region was missing from at least one of the human reference genomes (GRCh37, GRCh38, T2T-CHM13) and a reference agnostic analysis was needed. One of these cases, an INV9 mappable only in de novo assembled lrGS data using T2T-CHM13 disrupts EHMT1 consistent with a Mendelian diagnosis (Kleefstra syndrome 1; MIM#610253). Next, by pairwise comparison between T2T-CHM13, GRCh37, and GRCh38, as well as the chimpanzee and bonobo, we show that hundreds of megabases of sequence are missing from at least one human reference, highlighting that primate genomes contribute to genomic diversity. Aligning population genomic data to these regions indicated that these regions are variable between individuals. Our analysis emphasizes that T2T-CHM13 is necessary to maximize the value of lrGS for optimal inversion detection in clinical diagnostics. These results highlight the importance of leveraging diverse and comprehensive reference genomes to resolve unsolved molecular cases in rare diseases.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279346.124.

    • Freely available online through the Genome Research Open Access option.

    • Received March 15, 2024.
    • Accepted September 12, 2024.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    Articles citing this article

    OPEN ACCESS ARTICLE

    Preprint Server