Strong bias in long-read sequencing prevents assembly of Drosophila melanogaster Y-linked genes

  1. Fabiana Uno1
  1. 1Departamento de Genética, Instituto de Biologia, Universidade Federal do Rio de Janeiro 21941-617, Brazil;
  2. 2Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA
  • Corresponding author: bernardo1963{at}gmail.com
  • Abstract

    Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are generally considered free from sequence composition bias, a key factor, alongside read length, that explains their success in producing high-quality genome assemblies. Indeed, there had been very few reports of bias, the clearest one against GA-rich repeats in the human genome. However, our study reveals a systematic failure of both technologies to sequence and assemble specific exons of Drosophila melanogaster genes, indicating an overlooked limitation. Namely, multiple Y-linked exons are nearly or completely absent from raw reads produced by deep sequencing with state-of-the-art Nanopore (10.4 flow cells, 200× coverage) and PacBio (HiFi 50×). The same exons are accurately assembled using Illumina 67× coverage. We find that these missing exons are consistently located near simple satellite sequences, in which sequencing fails at multiple levels: read initiation (very few reads start within satellite regions), read elongation (satellite-containing reads are shorter on average), and basecalling (quality scores drop as sequencing enters a satellite sequence). These findings challenge the assumption that long-read technologies are unbiased and reveal a critical barrier to assembling sequences near repetitive regions. As large-scale sequencing projects move toward telomere-to-telomere assemblies in a wide range of organisms, recognizing and addressing these biases will be important to achieving truly complete and accurate genomes. Additionally, the underrepresented Y-linked exons provide a valuable benchmark for refining those sequencing technologies while improving the assembly of the highly heterochromatic and often neglected Drosophila Y Chromosome.

    Footnotes

    • Received February 23, 2025.
    • Accepted September 11, 2025.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    This Article

    1. Genome Res. © 2025 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press

    Article Category

    ORCID

    Share

    Preprint Server