Highly complete long-read genomes reveal pangenomic variation underlying yeast phenotypic diversity

  1. Meru J. Sadhu1
  1. 1Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  2. 2NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
  3. 3Department of Human Genetics, University of California, Los Angeles, Los Angeles, California 90095, USA;
  4. 4Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, California 90095, USA;
  5. 5Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, California 90095, USA;
  6. 6Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, California 90095, USA;
  7. 7Department of Computational Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
  1. 8 These authors contributed equally to this work.

  • Corresponding author: meru.sadhu{at}nih.gov
  • Abstract

    Understanding the genetic causes of trait variation is a primary goal of genetic research. One way that individuals can vary genetically is through variable pangenomic genes: genes that are only present in some individuals in a population. The presence or absence of entire genes could have large effects on trait variation. However, variable pangenomic genes can be missed in standard genotyping workflows, owing to reliance on aligning short-read sequencing to reference genomes. A popular method for studying the genetic basis of trait variation is linkage mapping, which identifies quantitative trait loci (QTLs), regions of the genome that harbor causative genetic variants. Large-scale linkage mapping in the budding yeast Saccharomyces cerevisiae has found thousands of QTLs affecting myriad yeast phenotypes. To enable the resolution of QTLs caused by variable pangenomic genes, we used long-read sequencing to generate highly complete de novo genome assemblies of 16 diverse yeast isolates. With these assemblies, we resolved QTLs for growth on maltose, sucrose, raffinose, and oxidative stress to specific genes that are absent from the reference genome but present in the broader yeast population at appreciable frequency. Copies of genes also duplicate onto chromosomes where they are absent in the reference genome, and we found that these copies generate additional QTLs whose resolution requires pangenome characterization. Our findings show the need for highly complete genome assemblies to identify the genetic basis of trait variation.

    Footnotes

    • 9 A complete list of NISC Comparative Sequencing Program members appears at the end of this paper.

    • Present addresses: 10Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD 20892, USA; 11Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.277515.122.

    • Received November 16, 2022.
    • Accepted April 26, 2023.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Articles citing this article

    Preprint Server