Expanded methylome and quantitative trait loci detection by long-read profiling of personal DNA

  1. Guillaume Bourque4,5,6
  1. 1Université de Montréal, Montréal Heart Institute, Montréal, Québec H1T 1C8, Canada;
  2. 2McGill University, McGill University and Genome Quebec Innovation Centre, Montréal, Québec H3A 2T8, Canada;
  3. 3Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA;
  4. 4McGill University, Human Genetics, Montréal, Québec H3A 0C7, Canada;
  5. 5Canadian Center for Computational Genomics, McGill University, Montréal, Québec H3A 2R7, Canada;
  6. 6Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec H3A 0G1, Canada
  • Corresponding authors: tpastinen{at}cmh.edu, guil.bourque{at}mcgill.ca
  • Abstract

    Structural variants (SVs) are omnipresent in human DNA, yet their genotype and methylation statuses are rarely characterized due to previous limitations in genome assembly and detection of modified nucleotides. Also, the extent to which SVs act as methylation quantitative trait loci (SV-mQTLs) is largely unknown. Here, we generated a pangenome graph summarizing SVs in 782 de novo assemblies obtained from Genomic Answers for Kids, capturing 14.6 million CpG dinucleotides that are absent from the CHM13v2 reference (SV-CpGs), thus expanding their number by 43.6%. Using 435 methylomes, we genotyped 4.06 million SV-CpGs, of which 3.93 million (96.8%) are methylated at least once. Nonrepeat sequences contribute 1.59 × 106 novel SV-CpGs, followed by centromeric satellites (6.57 × 105), simple repeats (5.40 × 105), Alu elements (5.07 × 105), satellites (2.17 × 105), LINE-1s (1.83 × 105), and SVA (SINE-VNTR-Alu) elements (1.50 × 105). Centromeric satellites, simple repeats, and SVAs are overrepresented in SV-CpGs versus reference CpGs. Similarly, methylation levels in SV-CpGs are more variable than in reference CpGs. To explore if SVs are potentially causal for functional variation, we measured SV-mQTLs. This revealed over 230,464 methylation bins where the methylation is associated with common SVs within 100 kbp. Finally, we identified 65,659 methylation bins (28.5%) where the leading QTL variant is an SV. In conclusion, we demonstrate that graph pangenomes provide full SV structures, the associated methylation variation, and reveal tens of thousands of SV-mQTLs, underscoring the importance of assembly based analyses of human traits.

    Footnotes

    • Received March 17, 2024.
    • Accepted February 11, 2025.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Articles citing this article

    Preprint Server