Abstract
Genome structural variants (SVs) comprise a sizable portion of functionally important genetic variation, yet many evade discovery using short reads. Although long-read sequencing can reveal hidden SVs, their contribution to organismal trait variation remains unclear. To address this gap, we investigate the molecular basis of 50 classical phenotypes in 11 Drosophila melanogaster strains using highly contiguous de novo genome assemblies generated with Oxford Nanopore Technologies long reads. These assemblies enable construction of a pangenome graph with nucleotide-resolution maps of SVs, including complex rearrangements such as the interchromosomal inverted duplication Dp(2;4)eyD and large tandem duplications at the Bar locus. We uncover new candidate causal mutations for 15 phenotypes and new molecular alleles for two mutations comprising tandem duplications, transposable element (TE) insertions, and indels. For example, the wing-vein phenotype plexus (px1) links to a 1.5 kb partial tandem gene duplication, and the century-old Curved (c1) wing phenotype links to a 7.5 kb DM412 retrotransposon disrupting the coding sequence of the muscle protein gene Strn-Mlck. We also identify a candidate intergenic enhancer for AblpeyD, supported by CRISPR-Cas9, and uncover eight SV alleles of previously identified causal genes, including uncharacterized SVs underlying the extensively studied white and yellow phenotypes. Overall, 67.4% of genes causing phenotypic changes harbor candidate SVs >100 bp, whereas only 28% are expected based on euchromatic SVs. Together, our results indicate that SVs are strongly enriched among this class of large-effect, deleterious visible phenotypes in Drosophila.