Complete genetic and epigenetic architecture of D4Z4 macrosatellites in FSHD, BAMS, and reference cohorts with D4Z4End2End

  1. Quentin Gouil1,2,6,7
  1. 1The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia;
  2. 2Department of Medical Biology, The University of Melbourne, Parkville, VIC 3052, Australia;
  3. 3Department of Biological Sciences, National University of Singapore, 117558, Singapore;
  4. 4Laboratory of Human Genetics and Therapeutics, King Abdullah University of Science and Technology, Thuwal 23955, Kingdom of Saudi Arabia;
  5. 5Marseille Medical Genetics, Aix-Marseille University, INSERM, Marseille 13005, France;
  6. 6Olivia Newton-John Cancer Research Institute, Heidelberg, VIC 3084, Australia;
  7. 7School of Cancer Medicine, La Trobe University, Bundoora, VIC 3086, Australia
  • Corresponding author: quentin.gouil{at}onjcri.org.au
  • Abstract

    The D4Z4 locus is a macrosatellite array on Chromosome 4q normally comprising 8 to >100 3.3-kb repeat units. Its size and repetitiveness render it refractory to most sequencing technologies; consequently, its genetic and epigenetic architectures remain incompletely understood despite their relevance to facioscapulohumeral muscular dystrophy (FSHD). Current FSHD molecular testing relies on complex, multistep and low-resolution assays, which aim to identify contractions on permissive haplotypes (FSHD type 1) or epigenetic reactivation due to pathogenic variants in the epigenetic machinery, most often in SMCHD1 (FSHD type 2). Recent guideline updates highlight the need for more accurate and comprehensive diagnostic approaches. Here, we leverage ultra-long whole-genome and Cas9-targeted sequencing to develop a fast and accurate workflow, D4Z4End2End, for comprehensive genetic and methylation analysis of D4Z4 alleles. We apply it to samples from two controls, four FSHD1 patients, four FSHD2 patients, and two patients with Bosma arhinia microphthalmia syndrome (BAMS) caused by SMCHD1 variants, as well as publicly available data from 30 B-lymphoblastoid cell lines from the 1000 Genomes Project and Human Pangenome Reference Consortium. We attain high-depth sequencing of full-length D4Z4 arrays of up to 40 repeat units (∼132 kb), accurately capture contracted arrays, genetic mosaicism, and pathogenic SMCHD1 variants, and generate consensus sequences of all D4Z4 alleles. We identify new allelic variants, analyze complex D4Z4 rearrangements including in-cis duplications, and reveal length- and SMCHD1-dependent methylation patterns across the D4Z4 array. Our findings offer insights into D4Z4 genetics and epigenetics, and demonstrate the potential of long-read nanopore sequencing to accelerate FSHD research and diagnostics.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.280907.125.

    • Freely available online through the Genome Research Open Access option.

    • Received May 7, 2025.
    • Accepted January 1, 2026.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    OPEN ACCESS ARTICLE

    Preprint Server