Highly complete long-read genomes reveal pangenomic variation underlying yeast phenotypic diversity

Cory A. Weller; Ilya Andreev; Michael J. Chambers; Morgan Park; NISC Comparative Sequencing Program; Joshua S. Bloom; Meru J. Sadhu; Beatrice B. Barnabas; Sean Black; Gerard G. Bouffard; Shelise Y. Brooks; Juyun Crawford; Holly Marfani; Lyudmila Dekhtyar; Joel Han; Shi-Ling Ho; Richelle Legaspi; Quino L. Maduro; Catherine A. Masiello; Jennifer C. McDowell; Casandra Montemayor; James C. Mullikin; Morgan Park; Nancy L. Riebow; Karen Schandler; Brian Schmidt; Christina Sison; Sirintorn Stantripop; James W. Thomas; Pamela J. Thomas; Meghana Vemulapalli; Alice C. Young

doi:10.1101/gr.277515.122

Highly complete long-read genomes reveal pangenomic variation underlying yeast phenotypic diversity

¹Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
²NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
³Department of Human Genetics, University of California, Los Angeles, Los Angeles, California 90095, USA;
⁴Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, California 90095, USA;
⁵Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, California 90095, USA;
⁶Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, California 90095, USA;
⁷Department of Computational Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA;
¹²NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA

↵8 These authors contributed equally to this work.

Corresponding author: meru.sadhu{at}nih.gov

Next Section

Abstract

Understanding the genetic causes of trait variation is a primary goal of genetic research. One way that individuals can vary genetically is through variable pangenomic genes: genes that are only present in some individuals in a population. The presence or absence of entire genes could have large effects on trait variation. However, variable pangenomic genes can be missed in standard genotyping workflows, owing to reliance on aligning short-read sequencing to reference genomes. A popular method for studying the genetic basis of trait variation is linkage mapping, which identifies quantitative trait loci (QTLs), regions of the genome that harbor causative genetic variants. Large-scale linkage mapping in the budding yeast Saccharomyces cerevisiae has found thousands of QTLs affecting myriad yeast phenotypes. To enable the resolution of QTLs caused by variable pangenomic genes, we used long-read sequencing to generate highly complete de novo genome assemblies of 16 diverse yeast isolates. With these assemblies, we resolved QTLs for growth on maltose, sucrose, raffinose, and oxidative stress to specific genes that are absent from the reference genome but present in the broader yeast population at appreciable frequency. Copies of genes also duplicate onto chromosomes where they are absent in the reference genome, and we found that these copies generate additional QTLs whose resolution requires pangenome characterization. Our findings show the need for highly complete genome assemblies to identify the genetic basis of trait variation.

The individual members of a species will have myriad trait differences, including in disease phenotypes, agricultural traits, traits that are subject to natural or sexual selection, and fundamental biological processes. Trait variation is often caused by genetic differences between individuals, and understanding how genetic variation is tied to phenotypic variation has been a central goal of genetics research since its inception under Gregor Mendel. The budding yeast Saccharomyces cerevisiae is a popular model organism in genetics research. It was the first eukaryotic species to have a sequenced reference genome (Goffeau et al. 1996), and, to date, most genes found in the reference genome have been functionally characterized (Engel et al. 2022). This was aided by the remarkable ease of testing the effects of genetic changes in yeast (Giaever and Nislow 2014), which has accelerated even further with CRISPR technology. Beyond the reference genome, S. cerevisiae presents substantial genetic and phenotypic diversity, making yeast a premier model for understanding the genetic basis of trait differences between diverse individuals. S. cerevisiae has been isolated from around the world and from diverse environments (Peter et al. 2018), including domestic fermentative environments, patient samples, and wild forests (Lee et al. 2022). Whole-genome sequencing of diverse yeast isolates has revealed that many strains’ genomes contain genes not seen in the reference genome. Although the reference genome contains approximately 6000 genes, the “pangenome” composed of all genes in the global yeast population is thought to contain approximately 8000 genes, 3000 of which are variably present across strains (Peter et al. 2018). The presence or absence of variable genes could have large effects on yeast traits. For instance, some strains have extra nonreference genes that allow them to metabolize xylose or melibiose (Naumov et al. 1990; Wenger et al. 2010). Highly diverged alleles of reference genes, as seen for the GAL genes, or copy number variation, as seen for CUP1, can also dramatically affect phenotypes (Zhao et al. 2014; Boocock et al. 2021). But pangenomic variation in strains’ genomes is not well annotated, as it is generally found in repetitive subtelomeric regions that recombine frequently (Louis et al. 1994) and are difficult to genotype (Yue et al. 2017). Thus, the nature of yeast pangenomic variation is still being understood.

Of the thousands of genetic differences spread throughout genomes, including single-nucleotide polymorphisms (SNPs) and variable pangenomic genes, only some affect a given trait of interest. In yeast, an effective strategy to locate relevant genetic differences is linkage mapping on a biparental cross (Ehrenreich et al. 2009). Two haploid yeast strains are mated to produce a diploid strain, which undergoes meiosis to produce haploid progeny strains known as segregants. Researchers determine genotypes and phenotypes for hundreds to thousands of segregants and then identify genomic loci in which segregants with one parental genotype have a significantly different phenotype from those with the other genotype. These mapped loci, known as quantitative trait loci (QTLs), are usually several kilobases in size and encompass multiple genes owing to genetic linkage between causal variants and neighboring variants, requiring further study to identify the causal variants. QTLs can be mapped even if the parental genome sequences are not completely known, if genetic variants linked to the causal variant are genotyped. Complete genome sequences are desired, however, for confidently identifying QTLs’ causal genetic variants. To date, yeast genome sequences have predominantly been assembled from high-throughput short-read DNA sequencing, using millions of reads <600 bp long (Peter et al. 2018). The resulting inferred genome sequences are often fragmented and incomplete, with particular assembly challenges arising from repetitive sequences and structural variants (SVs) such as translocations, copy number variants, and pangenomic insertions and deletions. In contrast, the advent of long-read sequencing, which generates reads of ≥10 kb (Koren and Phillippy 2015), allows for highly complete yeast genome assembly, as repetitive or diverged DNA sequences are generally short enough that a 10-kb-long read can span them completely (Istace et al. 2017; Yue et al. 2017; Bendixsen et al. 2021; Lee et al. 2022; O'Donnell et al. 2022; Saada et al. 2022). Highly complete genome sequences can significantly aid the resolution of QTLs to their true underlying causal genetic factors, especially those caused by pangenomic, repetitive, or structural variation, as seen for humans, rice, and the tomato (Qin et al. 2021; Wu et al. 2022; Zhou et al. 2022).

A recent study performed linkage mapping on crosses between 16 diverse yeast strains (Bloom et al. 2019). Each parent strain was crossed to two other parent strains, for 16 total crosses. Approximately 900 segregants were isolated for each cross, and the segregants’ growth was measured in 38 environments, including different nutritional sources, treatment with toxic compounds, and temperature and pH stress. Across the 16 biparental crosses and 38 traits, Bloom et al. (2019) mapped 7751 QTLs with a median size of 31.7 kb (12.7 QTLs per cross per trait), which were responsible for most of the genetic component of the phenotypic variance. Here, we investigated the contribution of pangenomic variation to growth on maltose, oxidative stress, sucrose, and raffinose using highly complete genome sequences generated with long-read sequencing technology for the 16 parental strains.

Previous Section Next Section

Results

Long-read genome sequences

To determine their pangenome content, we generated de novo genome assemblies for the 16 QTL-mapped strains using Pacific Biosciences (PacBio) HiFi technology, which produces reads ∼10–20 kb in length with very high sequence accuracy (Wenger et al. 2019). The final assemblies all contained precisely 17 contigs that corresponded to 16 nuclear chromosomes and the mitochondrial DNA (Supplemental Table S1). For 11 strains, every chromosome sequence started and ended with telomeric TG_1–3 repeats, and across the 16 strains, 505 of 512 nonmitochondrial contig ends had either TG_1–3 repeats or telomere-associated Y′ or X elements (Fig. 1A; Wellinger and Zakian 2012), indicating that the genome assemblies were largely or entirely complete.

View larger version:

Download as PowerPoint Slide

Figure 1.

Long-read genomes. (A) Detection of telomeric TG_1–3 repeats and telomere-associated Y′ or X element sequences at ends of assembled contigs. TG_1–3 repeats were detected as more than 20 TG dinucleotides in the final 100 bp of a contig, whereas Y′ and X elements were detected by Liftoff. Dark, medium, and light gray shadings indicate the presence of TG_1–3 repeats, Y′ elements, or X elements on a contig end, respectively, whereas white indicates the feature was not detected. Orange and purple highlighting mark chromosome arms affected by reciprocal translocations. (B) Linkage analysis of markers on reference Chr X and XV in the CBS2888 × CLIB219 cross. The existence of strong cross-chromosome correlation between Chr X and XV markers is consistent with a translocation between these chromosomes in strain CBS2888. (C) Gene and ORF content in assembled genomes. Genes and gene copies with >30% sequence identity to reference genes were classified as retained reference genes or diverged homologs using a threshold of 95% sequence identity to the reference gene.

Genome assemblies of four strains had major chromosomal rearrangements relative to the reference genome (Fig. 1A; Supplemental Fig. S1). Three of these strains’ reciprocal translocations were previously known: Chromosomes (Chr) VIII and XVI in strains M22 and YJM981 and Chr V and XIV in strain YJM454 (Hou et al. 2014). As seen by Hou et al. (2014), the translocation between Chr VIII and XVI led to unequal transmission of alleles to segregants (Supplemental Fig. S2). We also detected a reciprocal translocation between Chr X and XV in strain CBS2888, which has not been described previously. We checked for evidence of this translocation in the genotypes of the biparental cross between CBS2888 and CLIB219 generated by Bloom et al. (2019), which resulted in 943 recombinant haploid progeny. In these segregants, we observed linkage between genetic markers corresponding to reference Chr X and XV, confirming the observed translocation in CBS2888 (Fig. 1B) and indicating our assembled genomes reflect the true chromosomal architecture. We annotated SVs >500 bp in the 16 genomes using MUMandCo (O'Donnell and Fischer 2020). The most common SVs were deletion and insertion of repetitive sequences (Supplemental Fig. S1). Of interest, we detected the recently described recurrent inversion of 24 kb on Chr XIV (Salzberg et al. 2022); two strains had the reference orientation, and 14 strains had the inversion. We also detected a novel tandem repeat specific to strain YJM454 that amplified the uncharacterized open reading frame (ORF) YGR201C to 27 copies (Supplemental Fig. S3). YGR201C is predicted to be the sole eEF1Bγ gene family member without a C-terminal domain in the S. cerevisiae reference genome, but up to 15 have been found in other fungal species’ genomes (Renou et al. 2022); their function remains unknown. We detected an enrichment of SVs at chromosome ends (Supplemental Fig. S4).

We annotated the genes present in the assembled genomes using a combined strategy of finding genes that have homology with existing gene annotations in the S288C reference genome and of predicting ORFs de novo. The annotation tool Liftoff (Shumate and Salzberg 2021) was used to annotate genes that have sequence homology with known genes in the reference genome. To annotate any genes missed by Liftoff, we used ORFfinder (Sayers et al. 2011) to annotate ORFs longer than 100 codons. Gene identity and content varied widely among the 16 strains. Up to 96 reference genes per strain were replaced by diverged homologs, which could reflect introgressions from other species or balancing selection within S. cerevisiae (D'Angiolo et al. 2020; Boocock et al. 2021), and up to 84 reference genes were absent without replacement (Fig. 1C). We found additional copies of reference genes, with 190 in strain YJM981 owing to its highly repetitive subtelomeres, which are thought to reflect a deficiency in telomere maintenance (Bergström et al. 2014). We found up to 47 additional distant homologs of reference genes and 73 ORFs that did not have homology with reference genes per strain. Variable genes were strongly enriched at subtelomeres (Supplemental Fig. S4), as seen previously (Yue et al. 2017). Encouragingly, the strain BY had very few differences from the reference genome, as expected owing to its close relation to S288C, the reference genome strain (Brachmann et al. 1998). The most major difference arose from the BY genome assembly containing nine copies of the CUP1 tandem repeat locus (Supplemental Fig. S3), which is represented in condensed form in the reference genome with two copies (Zhao et al. 2014). One hundred sixty-six nonreference genes from the yeast pangenome determined from 1011 strains (Peter et al. 2018) were present in our 16 genomes (Supplemental Fig. S5), representing ∼10% of the nonreference pangenome with <2% of the strains.

To determine how differences in pangenome content between strains contribute to trait variation, we analyzed our annotated genomes in conjunction with previously mapped growth QTLs (Bloom et al. 2019). As variable pangenomic genes were predominantly localized at subtelomeres (Supplemental Fig. S4), we noted with interest that growth on maltose, paraquat, sucrose, and raffinose each had multiple large-effect subtelomeric QTLs. Below, we investigated how pangenomic variation contributes to these traits.

Highly variable maltose loci

Maltose is a disaccharide composed of two glucose molecules. It is a major component of brewer's wort (He et al. 2014) and is fermented by yeast to produce beer. Metabolism of maltose in S. cerevisiae involves genes from the MALR, MALT, and MALS gene families, which encode transcription factors, maltose transporters, and maltase enzymes, respectively (Brown et al. 2010). MALR transcription factors can activate MALT and MALS expression upon detection of maltose. Genes from these families are typically found clustered together in the genome to form MAL loci, often alongside IMA genes that encode enzymes that digest isomaltose (Teste et al. 2010). Previous analysis has found MAL loci variably present on five chromosomes (Charron et al. 1989); for instance, the reference genome encodes MAL loci on Chr II and VII.

Bloom et al. (2019) mapped QTLs for maltose growth in the 16-isolate panel. A recurrent QTL at the right end of Chr VII was the primary contributor to variable maltose growth (Fig. 2A), appearing in 12 of the 16 crosses with LOD scores from 38 to 173. As this coincides with the position of a MAL locus in the reference genome, we hypothesized that these QTLs could result from variation in MALR, MALT, or MALS. We examined the MAL genes on Chr VII across our 16 high-quality genomes. In contrast to the reference genome, which has a single copy of each gene, the size of the Chr VII MAL locus varied from two to 10 MAL and IMA genes, with zero to four MALRs, one to three MALTs, zero or one MALSs, and zero to two IMAs (Fig. 2B). This variation was not because of recent gene duplication events, as homologs had low sequence identity to each other. For instance, no pair of Chr VII MALRs in a strain had >85% DNA identity (Fig. 2C). Whereas diverse MALR genes have been found previously on different chromosomes (Charron et al. 1989), we found that Chr VII alone harbored homologs of all previously identified MALR genes except MAL33 and YPR196W (Fig. 2C). We checked the placement of these genes on Chr VII using linkage mapping genotype data. We observed that segregant genotypes on the right arm of Chr VII were strongly correlated with the presence of short reads matching variable MALRs, confirming they reside on Chr VII (Supplemental Figs. S6–S8). Similar to MALR diversity, we observe up to three distinct MALT genes in a single MAL cluster, which correspond to three distinct maltose transporters that have been previously characterized in S. cerevisiae (Han et al. 1995; Dietvorst et al. 2005).

View larger version:

Download as PowerPoint Slide

Figure 2.

Diverse, highly complex MAL loci on Chr VII. (A) QTL mapping of maltose growth in 16 biparental S. cerevisiae crosses (Bloom et al. 2019). Each plot shows linkage (calculated logarithm of the odds [LOD] score), in one biparental cross, between segregant genotype and growth on 2% maltose, plotted against genomic coordinates. Twelve of 16 crosses have a QTL with LOD score greater than 35 at the right end of Chr VII, highlighted in peach; plots are labeled with which parent provided the Chr VII allele associated with superior maltose growth, if any. (B) Diagram of the Chr VII MAL locus in each strain. MALR genes are labeled by homology group, as determined in panel C. (C) DNA percentage identity between MALR genes found on Chr VII (McWilliam et al. 2013). Previously identified MALR genes are included for reference, in bold (Gibson et al. 1997). MALRs with >97% sequence identity are sorted into homology groups, labeled at the bottom. As seen by Song et al. (2015), we observed a homology group of MALRs with low sequence identity to any named MALR, which we labeled MAL83 (assigned systematic name YSC0072 at the Saccharomyces Genome Database [SGD; https://www.yeastgenome.org/]). (D, top) Maltose growth for maltose-poor strains BY, YJM978, YPS1009, Y10, PW5, and CLIB219 with plasmids carrying MALR_YJM145 (blue) or empty vector (gray). n = 5 biological replicates, except for YJM978 + empty vector, which had four biological replicates. (Bottom) Growth was quantified as the area under the growth curve (AUC) for maltose normalized to the AUC for glucose (Supplemental Fig. S10). Maltose growth was significantly higher with MALR_YJM145 than empty vector for strains BY, YJM978, YPS1009, Y10, and PW5. (*) P < 0.005, Tukey's HSD following one-way ANOVA.

Previous studies on diverse MALRs found they differ in which sugars they sense, and only some support growth on maltose. MAL23, MAL43, and MAL63 support growth on maltose, whereas MAL13, MAL33, MAL64, and YPR196W do not (Gibson et al. 1997; Brown et al. 2010). In contrast, the three characterized S. cerevisiae MALT genes all transport maltose (Han et al. 1995; Dietvorst et al. 2005). To test whether Chr VII MAL genotypes associated with poor maltose growth lacked a maltose-proficient MALR, we introduced MALR_YJM145, which was the sole Chr VII MALR in maltose-proficient strain YJM145 and has 99% protein identity with the known maltose-proficient Mal63, into six maltose-poor strains using a CEN plasmid. MALR_YJM145 significantly improved maltose growth for five strains (Fig. 2D). For one of these strains, YPS1009, the improved maltose growth was still slow, likely because its sole maltose transporter is a homolog of MTY1 (Supplemental Fig. S9), which has low maltose affinity (Salema-Oom et al. 2005). The remaining strain, CLIB219, lacked MALS (Fig. 2B); segregants of the CLIB219 × M22 cross grew on maltose if they inherited the MALS gene on M22's Chr II (Supplemental Fig. S9). Thus, variation in all three MAL genes affected strains’ ability to grow on maltose, with a major contribution from the presence or absence of maltose-proficient MALRs.

The RM and CLIB413 genome assemblies contained MAL genes in the subtelomeric region on the right end of Chr XI (Fig. 3A,B), which we confirmed by linkage analysis (Supplemental Figs. S11, S12). We hypothesized these MAL loci would benefit maltose growth, as the RM and CLIB413 Chr XI MALR genes encoded chimeras between Mal23 and Mal43 (Fig. 3C), which are maltose-proficient (Gibson et al. 1997). Indeed, all crosses involving RM and CLIB413 had maltose QTLs at the right end of Chr XI, with segregants that inherited MALR on Chr XI having superior maltose growth (Figs. 2A, 3D,E), as previously observed for the BY × RM cross (Bloom et al. 2013). As maltose-proficient MALR was sufficient for BY and YJM978 maltose growth (Fig. 2D), we concluded the extra MALRs caused the Chr XI maltose QTLs. Notably, our high-quality genome assemblies allowed us to dissect QTLs to causal genes that are entirely absent from Chr XI in the reference genome.

View larger version:

Download as PowerPoint Slide

Figure 3.

Maltose QTLs on Chr XI are caused by nonreference MAL loci. (A,B) Diagrams of MAL loci on Chr XI of RM and CLIB413, respectively, along with the MAL-less loci in the strains they were crossed to for QTL mapping. (C) Amino acid sequence comparison of Chr XI MALRs from RM and CLIB413 to Mal23 and Mal43. (D,E) Scatterplots of maltose growth of segregants from crosses involving strains RM and CLIB413, respectively, partitioned by segregant genotype at Chr XI and VII. Colony radiuses were measured by Bloom et al. (2019) after 48-h growth on maltose plates, normalized to colony radiuses on glucose plates, and then scaled to have a mean of zero and a variance of one. The mean phenotype of each genotypic grouping of segregants is shown with a blue line. (NS) Not significant, (*) P < 0.05, one-tailed Student's t-test, with the alternative hypothesis that the extra MAL genes improve maltose growth.

SGE1 copy-number variation affects paraquat resistance

Paraquat, a widely used herbicide, is a cationic small molecule that catalyzes the formation of reactive oxygen species, especially superoxide, from cellular electron donors (Fukushima et al. 2002). To explore natural variation in oxidative stress tolerance, Bloom et al. (2019) performed QTL mapping for paraquat resistance in the 16 strains. Eleven crosses had a paraquat-resistance QTL at the right end of Chr XVI with LOD score greater than 10, which was the strongest paraquat QTL for six crosses (Fig. 4A). The Chr XVI subtelomeric region contains SGE1, encoding a transporter that extrudes cationic compounds. Natural variation in SGE1 underlies variation in tolerance of ionic liquids used in biofuel production (Higgins et al. 2018). Genome-wide screens have found that SGE1 overexpression confers paraquat resistance (Österberg et al. 2006) and SGE1 deletion causes paraquat sensitivity (Helsen et al. 2020). In our genome sequences, we observed that paraquat-sensitive Chr XVI loci had one of three SGE1 alleles: SGE1-P282S/L284S, SGE1-D429E, or SGE1-E332Q/K467E (Fig. 4B). We tested the effect of these three SGE1 alleles on paraquat resistance in BY sge1Δ, along with the ancestral allele (Fig. 4C). Ancestral SGE1 provided the most paraquat resistance, SGE1-D429E provided intermediate resistance, and SGE1-P282S/L284S and SGE1-E332Q/K467E were indistinguishable from empty vector (Fig. 4C). Thus, we concluded that single-nucleotide variation in SGE1 is a major contributor to differences in paraquat resistance.

View larger version:

Download as PowerPoint Slide

Figure 4.

Diversity in SGE1 underlies widespread variation in paraquat tolerance. (A) QTL mapping of growth on 0.75 mM paraquat (Bloom et al. 2019), as in Figure 2A. The site of a recurrent QTL on Chr XVI is highlighted in peach. (B) Diagram of the SGE1 alleles on Chr XVI, with derived nonsynonymous variants labeled. (C) Growth over time in liquid YPD media with 3.5 mM paraquat for BY sge1Δ with plasmids carrying SGE1 from strains BY, YJM145, YPS1009, I14, or 273614 (orange) or empty vector (gray). n = 10 biological replicates. The last panel quantifies strain growth as the AUC for growth in paraquat normalized to the AUC for growth without paraquat (Supplemental Fig. S13). SGE1 alleles from strains BY, YJM145, and YPS1009 provided significantly less protection than the ancestral allele from strain I14. (*) P < 0.005, Tukey's HSD following one-way ANOVA. In addition, SGE1 alleles from strains YJM145, I14, and 273614 provided significantly greater protection than empty vector ([†] P < 0.001, Tukey's HSD), whereas alleles from strains BY and YPS1009 did not. (D–F) Additional SGE1 genes were found in strains CLIB219, 273614, and RM, respectively. Scatterplots below show growth on paraquat for segregants in crosses involving those strains, partitioned by SGE1 genotype. Plots are shown as in Figure 3, with normalization to growth on plates without paraquat. Mean phenotypes are shown with orange lines. (*) P < 0.05, one-tailed Student's t-test, with the alternative hypothesis that the extra SGE1 copy improves growth on paraquat.

Though the reference genome only has SGE1 on Chr XVI, we found extra copies of SGE1 on other chromosomes in three strains: RM, 273614, and CLIB219 (Fig. 4D–F). Linkage analysis indicated these SGE1 copies were correctly placed (Supplemental Figs. S14–S16). We examined the paraquat growth of segregants in crosses involving these strains. Segregants with the extra ancestral SGE1 on Chr VII from CLIB219 had substantially improved paraquat growth (Fig. 4D). Segregants with the extra SGE1-D429E on Chr XI from strain 273614 had moderately improved growth, especially if their Chr XVI locus had the inferior SGE1-P282S/L284S allele from YJM981 (Fig. 4E). The extra SGE1-P282S/L284S on Chr XI from strain RM corresponded to slightly improved growth in the BY × RM cross and no detectable benefit in the RM × YPS163 cross (Fig. 4F). These results mirrored our SGE1 allele replacement results, in which the ancestral SGE1 was best, SGE1-D429E was moderate, and SGE1-P282S/L284S was poor. Thus, we concluded that, through duplication, SGE1 causes additional QTLs on chromosomes other than where it is normally known to reside. As with the maltose Chr XI QTLs, resolving these QTLs would not have been possible without characterization of pangenomic variation. Furthermore, long-read sequencing facilitated correct genotyping of SGE1 SNPs, a prerequisite for QTL resolution, in strains with SGE1 copies on multiple chromosomes.

SUC gene content affects sucrose and raffinose growth

Sucrose, a disaccharide of glucose and fructose, is a common plant sugar found in fruits and tree sap (Liu et al. 2006; Fink et al. 2018). In yeast, sucrose is hydrolyzed by the enzyme invertase (Marques et al. 2016) to produce glucose and fructose, which then enter standard carbon metabolism pathways. Invertase is encoded in the S. cerevisiae reference genome by SUC2 on Chr IX. Some strains carry additional nonreference subtelomeric SUC loci; nine have been identified (Carlson et al. 1985; Naumov and Naumova 2010a,b). We found subtelomeric SUC genes on eight chromosomes (Fig. 5A), including three novel SUC loci, supported by linkage data (Supplemental Figs. S17–S19). The subtelomeric SUC genes had high sequence identity to each other and low sequence identity to SUC2 (Fig. 5B; Naumova et al. 2014), indicating they recently spread across yeast chromosomes through inter-telomeric recombination (Carlson et al. 1985) long after diverging from SUC2. There was likely recent recombination between subtelomeric SUC genes and SUC2, as the predominant subtelomeric SUC allele had alternating regions of low and high sequence identity to SUC2 (Fig. 5C), and some strains had chimeric SUC genes between the two (Fig. 5B,D). SUC2 and the subtelomeric SUC progenitor may have split before S. cerevisiae’s speciation, as their highest diverged regions are more dissimilar than SUC2 from S. cerevisiae and its sister species S. paradoxus (Fig. 5C), although subtelomeric SUC genes have not been found in S. paradoxus (Bozdag and Greig 2014; Naumova et al. 2014; Yue et al. 2017).

View larger version:

Download as PowerPoint Slide

Figure 5.

Subtelomeric copies of genes encoding invertase. (A) Invertase genes in the 16 genomes, with subtelomeric SUC loci and the SUC2 locus marked with wedges and a vertical bar, respectively. The SUC9 locus on Chr XIV (Naumov and Naumova 2010b) was not mapped to a specific arm, so we labeled both arms as SUC9. We did not observe SUC genes on Chr VIII, XIII, or XVI (SUC7, SUC4, and SUC10, respectively) (Naumova et al. 2014). (B) Gene percentage identity between SUC genes. (C) Gene sequence comparison of SUC2 from strain BY to the representative subtelomeric SUC gene from strain 273614 (top) and SUC2 from S. paradoxus (bottom). (D) Gene sequence comparison of the Chr VII SUC gene from strain YPS1009 to the subtelomeric SUC gene from strain 273614 and SUC2 from strain BY. (E,F) Scatterplots of growth on 2% sucrose or raffinose, respectively, of segregants of the 273614 × YJM981 cross, partitioned by SUC genotype at Chr IV and IX, as in Figure 3. Mean phenotypes are shown with magenta lines. (*) P < 0.05, one-tailed Student's t-test, with the alternative hypothesis that the extra SUC copy improves growth.

Previous studies have found that subtelomeric SUC genes allow strains lacking SUC2 to grow on sucrose (Carlson and Botstein 1983), which we also observe. Strain YJM981 has a natural frameshift loss-of-function mutation in SUC2, whereas strain 273614 has both a functional SUC2 and a subtelomeric SUC on Chr IV. In the 273614 × YJM981 cross, segregants with YJM981's suc2-fs allele grew substantially better on sucrose if they inherited the Chr IV subtelomeric SUC from strain 273614 (Fig. 5E). In contrast, segregants with the functional SUC2 allele from strain 273614 had no detectable boost to their sucrose growth from the Chr IV subtelomeric SUC. This was intriguing, as it suggested that although subtelomeric SUCs have maintained invertase function, growth on sucrose may not select for that function for the majority of the yeast population carrying functional SUC2.

Invertase also digests other sugars, which we conjectured could have selected for subtelomeric SUC gene invertase activity. Raffinose is a trisaccharide composed of galactose, glucose, and fructose. Invertase hydrolyzes raffinose to produce fructose and melibiose, a disaccharide of glucose and galactose. Invertase levels are more likely to be limiting for growth in raffinose than sucrose: Invertase has lower activity on raffinose (Gascón et al. 1968; Sainz-Polo et al. 2013), and raffinose hydrolysis produces less energy than sucrose, as most yeast strains can only use the resulting fructose and not the melibiose. Indeed, in contrast to sucrose, segregants with functional SUC2 in the 273614 × YJM981 cross grew significantly better on raffinose with the additional subtelomeric SUC gene (Fig. 5F). Thus, the extra invertase activity provided by subtelomeric SUC genes may be particularly beneficial in complex sugar environments.

Previous Section Next Section

Discussion

There has been increasing interest in describing species genetically not only through a single reference genome but with a population-wide pangenome that accounts for genes variably present in individuals. We used long-read sequencing to generate highly complete genome assemblies for 16 diverse strains of S. cerevisiae and to annotate pangenomic variation. To ascertain the effects of the uncovered genetic variation, we combined our long-read-derived genome sequences with linkage mapping data and identified QTLs caused by variable pangenomic genes. By leveraging linkage mapping data, we were also able to validate the long-read genome sequences, confirming a translocation and the placement of variable pangenomic genes through linkage to nearby genetic markers. Thus, combining linkage mapping and long-read sequencing strengthens our understanding of both the sequenced genomes and mapped QTLs.

In budding yeast, variable pangenomic genes are mostly found at subtelomeric locations (Yue et al. 2017). This could be related to yeast subtelomeres recombining between chromosomes. Inter-chromosomal recombination can lead to addition or loss of genes in the daughter cells following cell division (Louis et al. 1994). The genes we examined here, MALR, SGE1, and SUC, are all subtelomeric. Consistent with the subtelomeres experiencing inter-chromosomal recombination, copies of all three genes were found at the end of more than one chromosome. In some cases, we found identical gene copies on different chromosomes, suggesting their recombination events occurred recently. For SGE1, three distinct alleles seen on Chr XVI are also seen on other chromosomes, implying three independent recombination events, further underscoring the frequency of inter-chromosomal recombination.

In addition to subtelomeres, pangenomic variation is occasionally found at interior chromosomal regions in S. cerevisiae. We searched genome-wide for recurrent large-effect QTLs whose confidence intervals contained structural variation >250 bp (Supplemental Table S2). Although most recovered loci were subtelomeric, we also identified previously known copy number variants in CUP1 and the ENA locus, affecting copper and lithium sensitivity, respectively (Dunn et al. 2012; Zhao et al. 2014). Thus, interior loci can also harbor recurrent and significant trait-affecting pangenomic variation.

Increased gene dosage of limiting gene products can increase fitness, and we observed that increasing from one functional gene to two improved fitness through MALR for maltose, SGE1 for paraquat, and SUC for raffinose. The variable presence of highly identical sequences on different chromosomes is particularly challenging to detect using reference genomes and short-read genotyping. Correctly determining the genomic locations of variable genes is especially important for resolving the genetic basis of mapped QTLs, which requires precise knowledge of the sequence differences in the QTL. QTLs were caused by genes in locations absent from the reference genome for all three traits examined here.

We predict that variable pangenomic genes have many more effects on yeast phenotypic diversity than the traits we studied. The variable component of the yeast pangenome comprises thousands of genes (Peter et al. 2018), and each of those genes may contribute to very particular traits, such as metabolism of specific nutrients or growth in specific environments. For instance, several homologs of MALR were widespread in our 16 genomes but did not support growth on maltose, including a previously uncharacterized MALR we termed MAL83, raising the possibility that they function in a different sugar environment. With the variable pangenomic content annotated here for a large, genotyped mapping population, studying a wide variety of phenotypes in the panel's segregants could reveal the phenotypic effects of many other pangenomic genes in these genomes.

The design of the linkage mapping population, in which all individuals were direct progeny of 16 parental strains, meant that generating high-quality genome assemblies for just the parental strains was sufficient for resolving QTLs to specific pangenomic variation. This strategy of generating high-quality genome assemblies of select individuals can be used for linkage mapping studies on any group of closely related individuals, such as family studies in humans. As the cost and analytical constraints of generating long-read genome assemblies continues to decline (Jarvis et al. 2022), it will become possible to generate high-confidence de novo whole-genome sequences of every individual in a mapping study, including genome-wide association studies of unrelated humans. We expect that assembly of highly complete genomes using long-read sequencing will be vital for us to fully understand the genetic sources of trait variation.

Previous Section Next Section

Methods

Genome sequencing and assembly

DNA was isolated from the 16 parental strains using the Masterpure yeast DNA purification kit (Lucigen) and sequenced on a Sequel II sequencer (PacBio) to produce circular consensus sequence (CCS) reads, as detailed in the Supplemental Methods.

CCS reads from each sample were assembled using Canu v. 2.2 (parameters genomeSize = 12mb -pacbio-hifi) (Koren et al. 2017). When read coverage was greater than 50×, reads were down-sampled to 50× coverage using seqtk v. 1.2 before assembly (https://github.com/lh3/seqtk). Redundant contigs were identified from each initial draft assembly using purge_dups v. 1.0.1 (Guan et al. 2020). Mitochondrial contig sequences were visualized and manually circularized using Gepard v. 1.30 (Krumsiek et al. 2007). Further refinement of the contig sequences involved comparing the draft contigs against the S. cerevisiae S288C reference genome (ASM205763v1) using BLASTN (v. 2.2.24) to identify contigs representing telomeric ends that were not incorporated into the corresponding chromosome and overlapping chromosome contigs, particularly the two contigs representing Chr XII that were split at the rDNA cluster. Overlapping contigs were visualized using Gepard to identify the overlapping regions and manually joined using exported trimmed sequences. The structure of the rDNA cluster in the joined Chr XII contigs represents a collapsed version of this extended repeat region.

Genome annotation

All code and input data for performing genome annotation are hosted at Zenodo (http://doi.org/10.5281/zenodo.7846378). Genes were annotated in the strain FASTA files using a combined strategy of lifting over gene annotations from the reference S. cerevisiae genome and annotating ORFs de novo. For lifting over reference annotations, we used Liftoff v. 1.6.1 (Shumate and Salzberg 2021) with the parameters -p 4 -infer_transcripts -copies -s 0.3 -sc 0.3. We chose Liftoff owing to its process of annotating the products of a gene duplication as either original or copy genes. Reference annotations were downloaded from https://downloads.yeastgenome.org/latest/saccharomyces_cerevisiae.gff.gz on December 1, 2021.

To annotate ORFs de novo, we used ORFfinder v. 0.4.3 with the parameters -outfmt = 0 -ml 210. ORFs were checked for homology with reference genes using BLASTN. We removed highly repetitive ORFs, identified with TRF v. 4.09.1 (Benson 1999). Using custom R scripts (R. v. 4.1.2) (R Core Team 2023), ORFs were required to be >300 bp long, to begin with ATG, and to not overlap other genomic features (genes, transposable elements, centromeres, tRNAs, subtelomeric X and Y′ elements, and autonomously replicating sequences), which were annotated using Liftoff, and other ORFs.

Validation of assemblies using genetic maps

For a given cross, we used MUMmer (v. 4.0.0) (Marçais et al. 2018) to generate an alignment between chromosomes of interest from the two parent genome sequences and to call SNPs. Using custom scripts, we generated a final reference sequence that only included sites of SNPs at least 20 bp from an alignment mismatch, with invariant or poorly aligned regions replaced with Ns. Illumina short reads of the segregants, obtained from the NCBI Sequence Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra) (study SRP201965), were aligned with BWA-MEM (v. 0.7.17) (Li and Durbin 2009), and SNPs called with BCFtools (v. 1.16) (Li 2011). We called segregant genotypes in 1-kb windows, assigning parentage if at least 90% of SNPs agreed on parentage (otherwise, assigned N). Linkage disequilibrium and association mapping were performed with PLINK (v. 1.9) (Chang et al. 2015).

To test whether gene sequences of interest were assembled onto correct chromosomes, we performed linkage mapping in segregants of biparental crosses predicted to be heterozygous for the sequence. For each segregant, we quantified Illumina reads per million (RPM) matching the sequence using BWA with strict matching parameters: -B 40 -O 60 -E 10 -L 100. Linkage mapping was performed using segregant RPM as a quantitative phenotype.

Yeast growth assays

Strains, plasmids, and oligonucleotides are listed in Supplemental Tables S3, S4, and S5, respectively, with construction detailed in the Supplemental Methods. Plasmids carrying the MALR and SGE1 alleles were generated by Gibson assembly (New England Biolabs [NEB]) (Gibson et al. 2009). A lithium acetate transformation protocol (Becker and Lundblad 2001) was used to transform yeast strains with plasmids. Strain growth curves were measured in 96-well plates (Corning 3370) over 3–5 d on a SPECTROstar Omega microplate reader equipped with a plate stacker (BMG Labtech) at room temperature with 30 s of shaking at 600 RPM before each plate reading. Sample growth was quantified as the ratio of the AUC in maltose or paraquat media to the AUC in glucose media. Statistical analysis was performed using one-way ANOVA followed by Tukey's post-hoc HSD test.

Previous Section Next Section

Data access

The genome assembly data generated in this study have been submitted to the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/) under accession number PRJNA779940 (see Supplemental Table S1). Sequences of MAL83 generated in this study have been submitted to the NCBI GenBank database (https://www.ncbi.nlm.nih.gov/genbank/) under accession numbers OR078935–OR078941. The code used to generate our analyses is available as Supplemental Code, and code and data are available as a snapshot of our Git repository hosted on Zenodo (http://doi.org/10.5281/zenodo.7846378).

Previous Section Next Section

Competing interest statement

The authors declare no competing interests.

Previous Section Next Section

Acknowledgments

We thank Arang Rhie, Sergey Koren, Adam Phillippy, Bill Pavan, David Bodine, Frank Albert, and all members of the Sadhu laboratory for helpful discussion. We thank Trey Sato, Leonid Kruglyak, and Fred Cross for strains and plasmids. This work used the computational resources of the NIH High Performance Computing Biowulf cluster. This work was supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (1ZIAHG200401, 1ZIBHG000196). J.S.B. was supported by the Howard Hughes Medical Institute.

Author contributions: M.J.S. conceived and supervised the project. J.S.B. and M.J.S. provided resources. I.A., C.A.W., M.J.C., and M.J.S. performed investigation and formal analysis. M.P. and the NISC Comparative Sequencing Program sequenced and assembled yeast genomes. M.J.S. wrote the original draft; all authors reviewed and edited the manuscript and approved the final version.

Previous Section Next Section

Footnotes

↵9 A complete list of NISC Comparative Sequencing Program members appears at the end of this paper.
↵Present addresses: ¹⁰Center for Alzheimer's and Related Dementias, National Institutes of Health, Bethesda, MD 20892, USA;
↵11 Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
[Supplemental material is available for this article.]
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.277515.122.

Received November 16, 2022.
Accepted April 26, 2023.

© 2023 Weller et al.; Published by Cold Spring Harbor Laboratory Press

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Previous Section

References

↵

Becker DM, Lundblad V. 2001. Introduction of DNA into yeast cells. In Current protocols in molecular biology (ed. Ausubel FM), Chapter 13, Unit13.7. Wiley, Hoboken, NJ. doi:10.1002/0471142727.mb1307s27

CrossRef Google Scholar
↵

Bendixsen DP, Gettle N, Gilchrist C, Zhang Z, Stelkens R. 2021. Genomic evidence of an ancient East Asian divergence event in wild Saccharomyces cerevisiae. Genome Biol Evol 13: evab001. doi:10.1093/gbe/evab001

CrossRef Google Scholar
↵

Benson G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573–580. doi:10.1093/nar/27.2.573

CrossRef Medline Google Scholar
↵

Bergström A, Simpson JT, Salinas F, Barré B, Parts L, Zia A, Nguyen Ba AN, Moses AM, Louis EJ, Mustonen V, et al. 2014. A high-definition view of functional genetic variation from natural yeast genomes. Mol Biol Evol 31: 872–888. doi:10.1093/molbev/msu037

CrossRef Medline Google Scholar
↵

Bloom JS, Ehrenreich IM, Loo WT, Lite T-L V, Kruglyak L. 2013. Finding the sources of missing heritability in a yeast cross. Nature 494: 234–237. doi:10.1038/nature11867

CrossRef Medline Google Scholar
↵

Bloom JS, Boocock J, Treusch S, Sadhu MJ, Day L, Oates-Barker H, Kruglyak L. 2019. Rare variants contribute disproportionately to quantitative trait variation in yeast. eLife 8: e49212. doi:10.7554/eLife.49212

CrossRef Medline Google Scholar
↵

Boocock J, Sadhu MJ, Durvasula A, Bloom JS, Kruglyak L. 2021. Ancient balancing selection maintains incompatible versions of the galactose pathway in yeast. Science 371: 415–419. doi:10.1126/science.aba0542

Abstract/FREE Full Text
↵

Bozdag GO, Greig D. 2014. The genetics of a putative social trait in natural populations of yeast. Mol Ecol 23: 5061–5071. doi:10.1111/mec.12904

CrossRef Google Scholar
↵

Brachmann CB, Davies A, Cost GJ, Caputo E, Li J, Hieter P, Boeke JD. 1998. Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14: 115–132. doi:10.1002/(SICI)1097-0061(19980130)14:2<115::AID-YEA204>3.0.CO;2-2

CrossRef Medline Google Scholar
↵

Brown CA, Murray AW, Verstrepen KJ. 2010. Rapid expansion and functional divergence of subtelomeric gene families in yeasts. Curr Biol 20: 895–903. doi:10.1016/j.cub.2010.04.027

CrossRef Medline Google Scholar
↵

Carlson M, Botstein D. 1983. Organization of the SUC gene family in Saccharomyces. Mol Cell Biol 3: 351–359. doi:10.1128/mcb.3.3.351-359.1983

Abstract/FREE Full Text
↵

Carlson M, Celenza JL, Eng FJ. 1985. Evolution of the dispersed SUC gene family of Saccharomyces by rearrangements of chromosome telomeres. Mol Cell Biol 5: 2894–2902. doi:10.1128/mcb.5.11.2894-2902.1985

Abstract/FREE Full Text
↵

Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4: 7. doi:10.1186/s13742-015-0047-8

CrossRef Medline Google Scholar
↵

Charron MJ, Read E, Haut SR, Michels CA. 1989. Molecular evolution of the telomere-associated MAL loci of Saccharomyces. Genetics 122: 307–316. doi:10.1093/genetics/122.2.307

Abstract/FREE Full Text
↵

D'Angiolo M, De Chiara M, Yue J-X, Irizar A, Stenberg S, Persson K, Llored A, Barré B, Schacherer J, Marangoni R, et al. 2020. A yeast living ancestor reveals the origin of genomic introgressions. Nature 587: 420–425. doi:10.1038/s41586-020-2889-1

CrossRef Google Scholar
↵

Dietvorst J, Londesborough J, Steensma HY. 2005. Maltotriose utilization in lager yeast strains: MTT1 encodes a maltotriose transporter. Yeast 22: 775–788. doi:10.1002/yea.1279

CrossRef Medline Google Scholar
↵

Dunn B, Richter C, Kvitek DJ, Pugh T, Sherlock G. 2012. Analysis of the Saccharomyces cerevisiae pan-genome reveals a pool of copy number variants distributed in diverse yeast strains from differing industrial environments. Genome Res 22: 908–924. doi:10.1101/gr.130310.111

Abstract/FREE Full Text
↵

Ehrenreich IM, Gerke JP, Kruglyak L. 2009. Genetic dissection of complex traits in yeast: insights from studies of gene expression and other phenotypes in the BY×RM cross. Cold Spring Harb Symp Quant Biol 74: 145–153. doi:10.1101/sqb.2009.74.013

Abstract/FREE Full Text
↵

Engel SR, Wong ED, Nash RS, Aleksander S, Alexander M, Douglass E, Karra K, Miyasato SR, Simison M, Skrzypek MS, et al. 2022. New data and collaborations at the Saccharomyces Genome Database: updated reference genome, alleles, and the Alliance of Genome Resources. Genetics 220: iyab224. doi:10.1093/genetics/iyab224

CrossRef Google Scholar
↵

Fink D, Dobbelstein E, Barbian A, Lohaus G. 2018. Ratio of sugar concentrations in the phloem sap and the cytosol of mesophyll cells in different tree species as an indicator of the phloem loading mechanism. Planta 248: 661–673. doi:10.1007/s00425-018-2933-7

CrossRef Google Scholar
↵

Fukushima T, Tanaka K, Lim H, Moriyama M. 2002. Mechanism of cytotoxicity of paraquat. Environ Health Prev Med 7: 89–94. doi:10.1265/ehpm.2002.89

CrossRef Medline Google Scholar
↵

Gascón S, Neumann NP, Lampen JO. 1968. Comparative study of the properties of the purified internal and external invertases from yeast. J Biol Chem 243: 1573–1577. doi:10.1016/S0021-9258(18)93580-5

Abstract/FREE Full Text
↵

Giaever G, Nislow C. 2014. The yeast deletion collection: a decade of functional genomics. Genetics 197: 451–465. doi:10.1534/genetics.114.161620

Abstract/FREE Full Text
↵

Gibson AW, Wojciechowicz LA, Danzi SE, Zhang B, Kim JH, Hu Z, Michels CA. 1997. Constitutive mutations of the Saccharomyces cerevisiae MAL-activator genes MAL23, MAL43, MAL63, and mal64. Genetics 146: 1287–1298. doi:10.1093/genetics/146.4.1287

Abstract/FREE Full Text
↵

Gibson DG, Young L, Chuang R-Y, Venter JC, Hutchison CA 3rd, Smith HO. 2009. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6: 343–345. doi:10.1038/nmeth.1318

CrossRef Medline Google Scholar
↵

Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, et al. 1996. Life with 6000 genes. Science 274: 563–567. doi:10.1126/science.274.5287.546

CrossRef Google Scholar
↵

Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. 2020. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36: 2896–2898. doi:10.1093/bioinformatics/btaa025

CrossRef Medline Google Scholar
↵

Han E, Cotty F, Sottas C, Jiang H, Michels CA. 1995. Characterization of AGT1 encoding a general α-glucoside transporter from Saccharomyces. Mol Microbiol 17: 1093–1107. doi:10.1111/j.1365-2958.1995.mmi_17061093.x

CrossRef Medline Google Scholar
↵

He Y, Dong J, Yin H, Zhao Y, Chen R, Wan X, Chen P, Hou X, Liu J, Chen L. 2014. Wort composition and its impact on the flavour-active higher alcohol and ester formation of beer – a review. J Inst Brew 120: 157–163. doi:10.1002/jib.145

CrossRef Google Scholar
↵

Helsen J, Voordeckers K, Vanderwaeren L, Santermans T, Tsontaki M, Verstrepen KJ, Jelier R. 2020. Gene loss predictably drives evolutionary adaptation. Mol Biol Evol 37: 2989–3002. doi:10.1093/molbev/msaa172

CrossRef Google Scholar
↵

Higgins DA, Young MKM, Tremaine M, Sardi M, Fletcher JM, Agnew M, Liu L, Dickinson Q, Peris D, Wrobel RL, et al. 2018. Natural variation in the multidrug efflux pump SGE1 underlies ionic liquid tolerance in yeast. Genetics 210: 219–234. doi:10.1534/genetics.118.301161

Abstract/FREE Full Text
↵

Hou J, Friedrich A, de Montigny J, Schacherer J. 2014. Chromosomal rearrangements as a major mechanism in the onset of reproductive isolation in Saccharomyces cerevisiae. Curr Biol 24: 1153–1159. doi:10.1016/j.cub.2014.03.063

CrossRef Medline Google Scholar
↵

Istace B, Friedrich A, d'Agata L, Faye S, Payen E, Beluche O, Caradec C, Davidas S, Cruaud C, Liti G, et al. 2017. de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer. GigaScience 6: 1–13. doi:10.1093/gigascience/giw018

CrossRef Medline Google Scholar
↵

Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, et al. 2022. Semi-automated assembly of high-quality diploid human reference genomes. Nature 611: 519–531. doi:10.1038/s41586-022-05325-5

CrossRef Google Scholar
↵

Koren S, Phillippy AM. 2015. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 23: 110–120. doi:10.1016/j.mib.2014.11.014

CrossRef Medline Google Scholar
↵

Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27: 722–736. doi:10.1101/gr.215087.116

Abstract/FREE Full Text
↵

Krumsiek J, Arnold R, Rattei T. 2007. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23: 1026–1028. doi:10.1093/bioinformatics/btm039

CrossRef Medline Google Scholar
↵

Lee TJ, Liu Y-C, Liu W-A, Lin Y-F, Lee H-H, Ke H-M, Huang J-P, Lu M-Y J, Hsieh C-L, Chung K-F, et al. 2022. Extensive sampling of Saccharomyces cerevisiae in Taiwan reveals ecology and evolution of predomesticated lineages. Genome Res 32: 864–877. doi:10.1101/gr.276286.121

Abstract/FREE Full Text
↵

Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: 2987–2993. doi:10.1093/bioinformatics/btr509

CrossRef Medline Google Scholar
↵

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25: 1754–1760. doi:10.1093/bioinformatics/btp324

CrossRef Medline Google Scholar
↵

Liu H-F, Wu B-H, Fan P-G, Li S-H, Li L-S. 2006. Sugar and acid concentrations in 98 grape cultivars analyzed by principal component analysis. J Sci Food Agric 86: 1526–1536. doi:10.1002/jsfa.2541

CrossRef Google Scholar
↵

Louis EJ, Naumova ES, Lee A, Naumov G, Haber JE. 1994. The chromosome end in yeast: its mosaic nature and influence on recombinational dynamics. Genetics 136: 789–802. doi:10.1093/genetics/136.3.789

Abstract/FREE Full Text
↵

Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. 2018. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol 14: e1005944. doi:10.1371/journal.pcbi.1005944

CrossRef Medline Google Scholar
↵

Marques WL, Raghavendran V, Stambuk BU, Gombert AK. 2016. Sucrose and Saccharomyces cerevisiae: a relationship most sweet. FEMS Yeast Res 16: fov107. doi:10.1093/femsyr/fov107

CrossRef Medline Google Scholar
↵

McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R. 2013. Analysis tool Web Services from the EMBL-EBI. Nucleic Acids Res 41: W597–W600. doi:10.1093/nar/gkt376

CrossRef Medline Google Scholar
↵

Naumov GI, Naumova ES. 2010a. Comparative genetics of yeasts: a novel β-fructosidase gene SUC8 in Saccharomyces cerevisiae. Russ J Genet 46: 323–330. doi:10.1134/S1022795410030099

CrossRef Google Scholar
↵

Naumov GI, Naumova ES. 2010b. Polygenic control for fermentation of β-fructosides in the yeast Saccharomyces cerevisiae: new genes SUC9 and SUC10. Microbiology 79: 160–166. doi:10.1134/S0026261710020050

CrossRef Google Scholar
↵

Naumov G, Turakainen H, Naumova E, Aho S, Korhola M. 1990. A new family of polymorphic genes in Saccharomyces cerevisiae: α-galactosidase genes MEL1-MEL7. Mol Gen Genet 224: 119–128. doi:10.1007/BF00259458

CrossRef Medline Google Scholar
↵

Naumova ES, Sadykova AZ, Martynenko NN, Naumov GI. 2014. Molecular polymorphism of β-fructosidase SUC genes in the Saccharomyces yeasts. Mol Biol 48: 573–582. doi:10.1134/S0026893314040086

CrossRef Google Scholar
↵

O'Donnell S, Fischer G. 2020. MUM&Co: accurate detection of all SV types through whole-genome alignment. Bioinformatics 36: 3242–3243. doi:10.1093/bioinformatics/btaa115

CrossRef Medline Google Scholar
↵

O'Donnell S, Yue J-X, Saada OA, Agier N, Caradec C, Cokelaer T, de Chiara M, Delmas S, Dutreux F, Fournier T, et al. 2022. 142 telomere-to-telomere assemblies reveal the genome structural landscape in Saccharomyces cerevisiae. bioRxiv doi:10.1101/2022.10.04.510633

Abstract/FREE Full Text
↵

Österberg M, Kim H, Warringer J, Melén K, Blomberg A, von Heijne G. 2006. Phenotypic effects of membrane protein overexpression in Saccharomyces cerevisiae. Proc Natl Acad Sci 103: 11148–11153. doi:10.1073/pnas.0604078103

Abstract/FREE Full Text
↵

Peter J, De Chiara M, Friedrich A, Yue J-X, Pflieger D, Bergström A, Sigwalt A, Barre B, Freel K, Llored A, et al. 2018. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature 556: 339–344. doi:10.1038/s41586-018-0030-5

CrossRef Medline Google Scholar
↵

Qin P, Lu H, Du H, Wang H, Chen W, Chen Z, He Q, Ou S, Zhang H, Li X, et al. 2021. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184: 3542–3558.e16. doi:10.1016/j.cell.2021.04.046

CrossRef Google Scholar
↵

R Core Team. 2023. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/.

Google Scholar
↵

Renou J, Sormani R, Gelhaye E, Didierjean C, Morel-Rouhier M. 2022. Genomic and functional insights into the diversification of the elongation factor eEF1Bγ in fungi. Fungal Biol Rev 42: 74–84. doi:10.1016/j.fbr.2022.07.001

CrossRef Google Scholar
↵

Saada OA, Tsouris A, Large C, Friedrich A, Dunham MJ, Schacherer J. 2022. Phased polyploid genomes provide deeper insight into the multiple origins of domesticated Saccharomyces cerevisiae beer yeasts. Curr Biol 32: 1350–1361.e3. doi:10.1016/j.cub.2022.01.068

CrossRef Google Scholar
↵

Sainz-Polo MA, Ramírez-Escudero M, Lafraya A, González B, Marín-Navarro J, Polaina J, Sanz-Aparicio J. 2013. Three-dimensional structure of Saccharomyces invertase: role of a non-catalytic domain in oligomerization and substrate specificity. J Biol Chem 288: 9755–9766. doi:10.1074/jbc.M112.446435

Abstract/FREE Full Text
↵

Salema-Oom M, Valadão Pinto V, Gonçalves P, Spencer-Martins I. 2005. Maltotriose utilization by industrial Saccharomyces strains: characterization of a new member of the α-glucoside transporter family. Appl Environ Microbiol 71: 5044–5049. doi:10.1128/AEM.71.9.5044-5049.2005

Abstract/FREE Full Text
↵

Salzberg LI, Martos AAR, Lombardi L, Jermiin LS, Blanco A, Byrne KP, Wolfe KH. 2022. A widespread inversion polymorphism conserved among Saccharomyces species is caused by recurrent homogenization of a sporulation gene family. PLoS Genet 18: e1010525. doi:10.1371/journal.pgen.1010525

CrossRef Google Scholar
↵

Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, et al. 2011. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 39(Database issue): D38–D51. doi:10.1093/nar/gkq1172

CrossRef Medline Google Scholar
↵

Shumate A, Salzberg SL. 2021. Liftoff: accurate mapping of gene annotations. Bioinformatics 37: 1639–1643. doi:10.1093/bioinformatics/btaa1016

CrossRef Google Scholar
↵

Song G, Dickins BJA, Demeter J, Engel S, Gallagher J, Choe K, Dunn B, Snyder M, Cherry JM. 2015. AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae. PLoS One 10: e0120671. doi:10.1371/journal.pone.0120671

CrossRef Medline Google Scholar
↵

Teste MA, François JM, Parrou JL. 2010. Characterization of a new multigene family encoding isomaltases in the yeast Saccharomyces cerevisiae, the IMA family. J Biol Chem 285: 26815–26824. doi:10.1074/jbc.M110.145946

Abstract/FREE Full Text
↵

Wellinger RJ, Zakian VA. 2012. Everything you ever wanted to know about Saccharomyces cerevisiae telomeres: beginning to end. Genetics 191: 1073–1105. doi:10.1534/genetics.111.137851

Abstract/FREE Full Text
↵

Wenger JW, Schwartz K, Sherlock G. 2010. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet 6: e1000942. doi:10.1371/journal.pgen.1000942

CrossRef Medline Google Scholar
↵

Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, et al. 2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37: 1155–1162. doi:10.1038/s41587-019-0217-9

CrossRef Medline Google Scholar
↵

Wu Z, Li T, Jiang Z, Zheng J, Liu Y, Liu Y, Xie Z. 2022. Graph pangenome reveals functional, evolutionary, and phenotypic significance of human nonreference sequences. bioRxiv doi:10.1101/2022.09.05.506692

Abstract/FREE Full Text
↵

Yue J-X, Li J, Aigrain L, Hallin J, Persson K, Oliver K, Bergström A, Coupland P, Warringer J, Lagomarsino MC, et al. 2017. Contrasting evolutionary genome dynamics between domesticated and wild yeasts. Nat Genet 49: 913–924. doi:10.1038/ng.3847

CrossRef Medline Google Scholar
↵

Zhao Y, Strope PK, Kozmin SG, McCusker JH, Dietrich FS, Kokoska RJ, Petes TD. 2014. Structures of naturally evolved CUP1 tandem arrays in yeast indicate that these arrays are generated by unequal nonhomologous recombination. G3 (Bethesda) 4: 2259–2269. doi:10.1534/g3.114.012922

Abstract/FREE Full Text
↵

Zhou Y, Zhang Z, Bao Z, Li H, Lyu Y, Zan Y, Wu Y, Cheng L, Fang Y, Wu K, et al. 2022. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606: 527–534. doi:10.1038/s41586-022-04808-9

CrossRef Google Scholar

[1] ↵

Becker DM, Lundblad V. 2001. Introduction of DNA into yeast cells. In Current protocols in molecular biology (ed. Ausubel FM), Chapter 13, Unit13.7. Wiley, Hoboken, NJ. doi:10.1002/0471142727.mb1307s27

CrossRef Google Scholar

[2] ↵

Bendixsen DP, Gettle N, Gilchrist C, Zhang Z, Stelkens R. 2021. Genomic evidence of an ancient East Asian divergence event in wild Saccharomyces cerevisiae. Genome Biol Evol 13: evab001. doi:10.1093/gbe/evab001

CrossRef Google Scholar

[3] ↵

Benson G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573–580. doi:10.1093/nar/27.2.573

CrossRef Medline Google Scholar

[4] ↵

Bergström A, Simpson JT, Salinas F, Barré B, Parts L, Zia A, Nguyen Ba AN, Moses AM, Louis EJ, Mustonen V, et al. 2014. A high-definition view of functional genetic variation from natural yeast genomes. Mol Biol Evol 31: 872–888. doi:10.1093/molbev/msu037

CrossRef Medline Google Scholar

[5] ↵

Bloom JS, Ehrenreich IM, Loo WT, Lite T-L V, Kruglyak L. 2013. Finding the sources of missing heritability in a yeast cross. Nature 494: 234–237. doi:10.1038/nature11867

CrossRef Medline Google Scholar

[6] ↵

Bloom JS, Boocock J, Treusch S, Sadhu MJ, Day L, Oates-Barker H, Kruglyak L. 2019. Rare variants contribute disproportionately to quantitative trait variation in yeast. eLife 8: e49212. doi:10.7554/eLife.49212

CrossRef Medline Google Scholar

[7] ↵

Boocock J, Sadhu MJ, Durvasula A, Bloom JS, Kruglyak L. 2021. Ancient balancing selection maintains incompatible versions of the galactose pathway in yeast. Science 371: 415–419. doi:10.1126/science.aba0542

Abstract/FREE Full Text

[8] ↵

Bozdag GO, Greig D. 2014. The genetics of a putative social trait in natural populations of yeast. Mol Ecol 23: 5061–5071. doi:10.1111/mec.12904

CrossRef Google Scholar

[9] ↵

Brachmann CB, Davies A, Cost GJ, Caputo E, Li J, Hieter P, Boeke JD. 1998. Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14: 115–132. doi:10.1002/(SICI)1097-0061(19980130)14:2<115::AID-YEA204>3.0.CO;2-2

CrossRef Medline Google Scholar

[10] ↵

Brown CA, Murray AW, Verstrepen KJ. 2010. Rapid expansion and functional divergence of subtelomeric gene families in yeasts. Curr Biol 20: 895–903. doi:10.1016/j.cub.2010.04.027

CrossRef Medline Google Scholar

[11] ↵

Carlson M, Botstein D. 1983. Organization of the SUC gene family in Saccharomyces. Mol Cell Biol 3: 351–359. doi:10.1128/mcb.3.3.351-359.1983

Abstract/FREE Full Text

[12] ↵

Carlson M, Celenza JL, Eng FJ. 1985. Evolution of the dispersed SUC gene family of Saccharomyces by rearrangements of chromosome telomeres. Mol Cell Biol 5: 2894–2902. doi:10.1128/mcb.5.11.2894-2902.1985

Abstract/FREE Full Text

[13] ↵

Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4: 7. doi:10.1186/s13742-015-0047-8

CrossRef Medline Google Scholar

[14] ↵

Charron MJ, Read E, Haut SR, Michels CA. 1989. Molecular evolution of the telomere-associated MAL loci of Saccharomyces. Genetics 122: 307–316. doi:10.1093/genetics/122.2.307

Abstract/FREE Full Text

[15] ↵

D'Angiolo M, De Chiara M, Yue J-X, Irizar A, Stenberg S, Persson K, Llored A, Barré B, Schacherer J, Marangoni R, et al. 2020. A yeast living ancestor reveals the origin of genomic introgressions. Nature 587: 420–425. doi:10.1038/s41586-020-2889-1

CrossRef Google Scholar

[16] ↵

Dietvorst J, Londesborough J, Steensma HY. 2005. Maltotriose utilization in lager yeast strains: MTT1 encodes a maltotriose transporter. Yeast 22: 775–788. doi:10.1002/yea.1279

CrossRef Medline Google Scholar

[17] ↵

Dunn B, Richter C, Kvitek DJ, Pugh T, Sherlock G. 2012. Analysis of the Saccharomyces cerevisiae pan-genome reveals a pool of copy number variants distributed in diverse yeast strains from differing industrial environments. Genome Res 22: 908–924. doi:10.1101/gr.130310.111

Abstract/FREE Full Text

[18] ↵

Ehrenreich IM, Gerke JP, Kruglyak L. 2009. Genetic dissection of complex traits in yeast: insights from studies of gene expression and other phenotypes in the BY×RM cross. Cold Spring Harb Symp Quant Biol 74: 145–153. doi:10.1101/sqb.2009.74.013

Abstract/FREE Full Text

[19] ↵

Engel SR, Wong ED, Nash RS, Aleksander S, Alexander M, Douglass E, Karra K, Miyasato SR, Simison M, Skrzypek MS, et al. 2022. New data and collaborations at the Saccharomyces Genome Database: updated reference genome, alleles, and the Alliance of Genome Resources. Genetics 220: iyab224. doi:10.1093/genetics/iyab224

CrossRef Google Scholar

[20] ↵

Fink D, Dobbelstein E, Barbian A, Lohaus G. 2018. Ratio of sugar concentrations in the phloem sap and the cytosol of mesophyll cells in different tree species as an indicator of the phloem loading mechanism. Planta 248: 661–673. doi:10.1007/s00425-018-2933-7

CrossRef Google Scholar

[21] ↵

Fukushima T, Tanaka K, Lim H, Moriyama M. 2002. Mechanism of cytotoxicity of paraquat. Environ Health Prev Med 7: 89–94. doi:10.1265/ehpm.2002.89

CrossRef Medline Google Scholar

[22] ↵

Gascón S, Neumann NP, Lampen JO. 1968. Comparative study of the properties of the purified internal and external invertases from yeast. J Biol Chem 243: 1573–1577. doi:10.1016/S0021-9258(18)93580-5

Abstract/FREE Full Text

[23] ↵

Giaever G, Nislow C. 2014. The yeast deletion collection: a decade of functional genomics. Genetics 197: 451–465. doi:10.1534/genetics.114.161620

Abstract/FREE Full Text

[24] ↵

Gibson AW, Wojciechowicz LA, Danzi SE, Zhang B, Kim JH, Hu Z, Michels CA. 1997. Constitutive mutations of the Saccharomyces cerevisiae MAL-activator genes MAL23, MAL43, MAL63, and mal64. Genetics 146: 1287–1298. doi:10.1093/genetics/146.4.1287

Abstract/FREE Full Text

[25] ↵

Gibson DG, Young L, Chuang R-Y, Venter JC, Hutchison CA 3rd, Smith HO. 2009. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6: 343–345. doi:10.1038/nmeth.1318

CrossRef Medline Google Scholar

[26] ↵

Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, et al. 1996. Life with 6000 genes. Science 274: 563–567. doi:10.1126/science.274.5287.546

CrossRef Google Scholar

[27] ↵

Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. 2020. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36: 2896–2898. doi:10.1093/bioinformatics/btaa025

CrossRef Medline Google Scholar

[28] ↵

Han E, Cotty F, Sottas C, Jiang H, Michels CA. 1995. Characterization of AGT1 encoding a general α-glucoside transporter from Saccharomyces. Mol Microbiol 17: 1093–1107. doi:10.1111/j.1365-2958.1995.mmi_17061093.x

CrossRef Medline Google Scholar

[29] ↵

He Y, Dong J, Yin H, Zhao Y, Chen R, Wan X, Chen P, Hou X, Liu J, Chen L. 2014. Wort composition and its impact on the flavour-active higher alcohol and ester formation of beer – a review. J Inst Brew 120: 157–163. doi:10.1002/jib.145

CrossRef Google Scholar

[30] ↵

Helsen J, Voordeckers K, Vanderwaeren L, Santermans T, Tsontaki M, Verstrepen KJ, Jelier R. 2020. Gene loss predictably drives evolutionary adaptation. Mol Biol Evol 37: 2989–3002. doi:10.1093/molbev/msaa172

CrossRef Google Scholar

[31] ↵

Higgins DA, Young MKM, Tremaine M, Sardi M, Fletcher JM, Agnew M, Liu L, Dickinson Q, Peris D, Wrobel RL, et al. 2018. Natural variation in the multidrug efflux pump SGE1 underlies ionic liquid tolerance in yeast. Genetics 210: 219–234. doi:10.1534/genetics.118.301161

Abstract/FREE Full Text

[32] ↵

Hou J, Friedrich A, de Montigny J, Schacherer J. 2014. Chromosomal rearrangements as a major mechanism in the onset of reproductive isolation in Saccharomyces cerevisiae. Curr Biol 24: 1153–1159. doi:10.1016/j.cub.2014.03.063

CrossRef Medline Google Scholar

[33] ↵

Istace B, Friedrich A, d'Agata L, Faye S, Payen E, Beluche O, Caradec C, Davidas S, Cruaud C, Liti G, et al. 2017. de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer. GigaScience 6: 1–13. doi:10.1093/gigascience/giw018

CrossRef Medline Google Scholar

[34] ↵

Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, et al. 2022. Semi-automated assembly of high-quality diploid human reference genomes. Nature 611: 519–531. doi:10.1038/s41586-022-05325-5

CrossRef Google Scholar

[35] ↵

Koren S, Phillippy AM. 2015. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 23: 110–120. doi:10.1016/j.mib.2014.11.014

CrossRef Medline Google Scholar

[36] ↵

Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27: 722–736. doi:10.1101/gr.215087.116

Abstract/FREE Full Text

[37] ↵

Krumsiek J, Arnold R, Rattei T. 2007. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23: 1026–1028. doi:10.1093/bioinformatics/btm039

CrossRef Medline Google Scholar

[38] ↵

Lee TJ, Liu Y-C, Liu W-A, Lin Y-F, Lee H-H, Ke H-M, Huang J-P, Lu M-Y J, Hsieh C-L, Chung K-F, et al. 2022. Extensive sampling of Saccharomyces cerevisiae in Taiwan reveals ecology and evolution of predomesticated lineages. Genome Res 32: 864–877. doi:10.1101/gr.276286.121

Abstract/FREE Full Text

[39] ↵

Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: 2987–2993. doi:10.1093/bioinformatics/btr509

CrossRef Medline Google Scholar

[40] ↵

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25: 1754–1760. doi:10.1093/bioinformatics/btp324

CrossRef Medline Google Scholar

[41] ↵

Liu H-F, Wu B-H, Fan P-G, Li S-H, Li L-S. 2006. Sugar and acid concentrations in 98 grape cultivars analyzed by principal component analysis. J Sci Food Agric 86: 1526–1536. doi:10.1002/jsfa.2541

CrossRef Google Scholar

[42] ↵

Louis EJ, Naumova ES, Lee A, Naumov G, Haber JE. 1994. The chromosome end in yeast: its mosaic nature and influence on recombinational dynamics. Genetics 136: 789–802. doi:10.1093/genetics/136.3.789

Abstract/FREE Full Text

[43] ↵

Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. 2018. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol 14: e1005944. doi:10.1371/journal.pcbi.1005944

CrossRef Medline Google Scholar

[44] ↵

Marques WL, Raghavendran V, Stambuk BU, Gombert AK. 2016. Sucrose and Saccharomyces cerevisiae: a relationship most sweet. FEMS Yeast Res 16: fov107. doi:10.1093/femsyr/fov107

CrossRef Medline Google Scholar

[45] ↵

McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R. 2013. Analysis tool Web Services from the EMBL-EBI. Nucleic Acids Res 41: W597–W600. doi:10.1093/nar/gkt376

CrossRef Medline Google Scholar

[46] ↵

Naumov GI, Naumova ES. 2010a. Comparative genetics of yeasts: a novel β-fructosidase gene SUC8 in Saccharomyces cerevisiae. Russ J Genet 46: 323–330. doi:10.1134/S1022795410030099

CrossRef Google Scholar

[47] ↵

Naumov GI, Naumova ES. 2010b. Polygenic control for fermentation of β-fructosides in the yeast Saccharomyces cerevisiae: new genes SUC9 and SUC10. Microbiology 79: 160–166. doi:10.1134/S0026261710020050

CrossRef Google Scholar

[48] ↵

Naumov G, Turakainen H, Naumova E, Aho S, Korhola M. 1990. A new family of polymorphic genes in Saccharomyces cerevisiae: α-galactosidase genes MEL1-MEL7. Mol Gen Genet 224: 119–128. doi:10.1007/BF00259458

CrossRef Medline Google Scholar

[49] ↵

Naumova ES, Sadykova AZ, Martynenko NN, Naumov GI. 2014. Molecular polymorphism of β-fructosidase SUC genes in the Saccharomyces yeasts. Mol Biol 48: 573–582. doi:10.1134/S0026893314040086

CrossRef Google Scholar

[50] ↵

O'Donnell S, Fischer G. 2020. MUM&Co: accurate detection of all SV types through whole-genome alignment. Bioinformatics 36: 3242–3243. doi:10.1093/bioinformatics/btaa115

CrossRef Medline Google Scholar

[51] ↵

O'Donnell S, Yue J-X, Saada OA, Agier N, Caradec C, Cokelaer T, de Chiara M, Delmas S, Dutreux F, Fournier T, et al. 2022. 142 telomere-to-telomere assemblies reveal the genome structural landscape in Saccharomyces cerevisiae. bioRxiv doi:10.1101/2022.10.04.510633

Abstract/FREE Full Text

[52] ↵

Österberg M, Kim H, Warringer J, Melén K, Blomberg A, von Heijne G. 2006. Phenotypic effects of membrane protein overexpression in Saccharomyces cerevisiae. Proc Natl Acad Sci 103: 11148–11153. doi:10.1073/pnas.0604078103

Abstract/FREE Full Text

[53] ↵

Peter J, De Chiara M, Friedrich A, Yue J-X, Pflieger D, Bergström A, Sigwalt A, Barre B, Freel K, Llored A, et al. 2018. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature 556: 339–344. doi:10.1038/s41586-018-0030-5

CrossRef Medline Google Scholar

[54] ↵

Qin P, Lu H, Du H, Wang H, Chen W, Chen Z, He Q, Ou S, Zhang H, Li X, et al. 2021. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184: 3542–3558.e16. doi:10.1016/j.cell.2021.04.046

CrossRef Google Scholar

[55] ↵

R Core Team. 2023. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/.

Google Scholar

[56] ↵

Renou J, Sormani R, Gelhaye E, Didierjean C, Morel-Rouhier M. 2022. Genomic and functional insights into the diversification of the elongation factor eEF1Bγ in fungi. Fungal Biol Rev 42: 74–84. doi:10.1016/j.fbr.2022.07.001

CrossRef Google Scholar

[57] ↵

Saada OA, Tsouris A, Large C, Friedrich A, Dunham MJ, Schacherer J. 2022. Phased polyploid genomes provide deeper insight into the multiple origins of domesticated Saccharomyces cerevisiae beer yeasts. Curr Biol 32: 1350–1361.e3. doi:10.1016/j.cub.2022.01.068

CrossRef Google Scholar

[58] ↵

Sainz-Polo MA, Ramírez-Escudero M, Lafraya A, González B, Marín-Navarro J, Polaina J, Sanz-Aparicio J. 2013. Three-dimensional structure of Saccharomyces invertase: role of a non-catalytic domain in oligomerization and substrate specificity. J Biol Chem 288: 9755–9766. doi:10.1074/jbc.M112.446435

Abstract/FREE Full Text

[59] ↵

Salema-Oom M, Valadão Pinto V, Gonçalves P, Spencer-Martins I. 2005. Maltotriose utilization by industrial Saccharomyces strains: characterization of a new member of the α-glucoside transporter family. Appl Environ Microbiol 71: 5044–5049. doi:10.1128/AEM.71.9.5044-5049.2005

Abstract/FREE Full Text

[60] ↵

Salzberg LI, Martos AAR, Lombardi L, Jermiin LS, Blanco A, Byrne KP, Wolfe KH. 2022. A widespread inversion polymorphism conserved among Saccharomyces species is caused by recurrent homogenization of a sporulation gene family. PLoS Genet 18: e1010525. doi:10.1371/journal.pgen.1010525

CrossRef Google Scholar

[61] ↵

Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, et al. 2011. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 39(Database issue): D38–D51. doi:10.1093/nar/gkq1172

CrossRef Medline Google Scholar

[62] ↵

Shumate A, Salzberg SL. 2021. Liftoff: accurate mapping of gene annotations. Bioinformatics 37: 1639–1643. doi:10.1093/bioinformatics/btaa1016

CrossRef Google Scholar

[63] ↵

Song G, Dickins BJA, Demeter J, Engel S, Gallagher J, Choe K, Dunn B, Snyder M, Cherry JM. 2015. AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae. PLoS One 10: e0120671. doi:10.1371/journal.pone.0120671

CrossRef Medline Google Scholar

[64] ↵

Teste MA, François JM, Parrou JL. 2010. Characterization of a new multigene family encoding isomaltases in the yeast Saccharomyces cerevisiae, the IMA family. J Biol Chem 285: 26815–26824. doi:10.1074/jbc.M110.145946

Abstract/FREE Full Text

[65] ↵

Wellinger RJ, Zakian VA. 2012. Everything you ever wanted to know about Saccharomyces cerevisiae telomeres: beginning to end. Genetics 191: 1073–1105. doi:10.1534/genetics.111.137851

Abstract/FREE Full Text

[66] ↵

Wenger JW, Schwartz K, Sherlock G. 2010. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet 6: e1000942. doi:10.1371/journal.pgen.1000942

CrossRef Medline Google Scholar

[67] ↵

Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, et al. 2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37: 1155–1162. doi:10.1038/s41587-019-0217-9

CrossRef Medline Google Scholar

[68] ↵

Wu Z, Li T, Jiang Z, Zheng J, Liu Y, Liu Y, Xie Z. 2022. Graph pangenome reveals functional, evolutionary, and phenotypic significance of human nonreference sequences. bioRxiv doi:10.1101/2022.09.05.506692

Abstract/FREE Full Text

[69] ↵

Yue J-X, Li J, Aigrain L, Hallin J, Persson K, Oliver K, Bergström A, Coupland P, Warringer J, Lagomarsino MC, et al. 2017. Contrasting evolutionary genome dynamics between domesticated and wild yeasts. Nat Genet 49: 913–924. doi:10.1038/ng.3847

CrossRef Medline Google Scholar

[70] ↵

Zhao Y, Strope PK, Kozmin SG, McCusker JH, Dietrich FS, Kokoska RJ, Petes TD. 2014. Structures of naturally evolved CUP1 tandem arrays in yeast indicate that these arrays are generated by unequal nonhomologous recombination. G3 (Bethesda) 4: 2259–2269. doi:10.1534/g3.114.012922

Abstract/FREE Full Text

[71] ↵

Zhou Y, Zhang Z, Bao Z, Li H, Lyu Y, Zan Y, Wu Y, Cheng L, Fang Y, Wu K, et al. 2022. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606: 527–534. doi:10.1038/s41586-022-04808-9

CrossRef Google Scholar

Highly complete long-read genomes reveal pangenomic variation underlying yeast phenotypic diversity

Abstract

Results

Long-read genome sequences

Highly variable maltose loci

SGE1 copy-number variation affects paraquat resistance

SUC gene content affects sucrose and raffinose growth

Discussion

Methods

Genome sequencing and assembly

Genome annotation

Validation of assemblies using genetic maps

Yeast growth assays

Data access

Competing interest statement

Acknowledgments

Footnotes

References

Articles citing this article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

ORCID

Share

Preprint Server

Navigate This Article

Current Issue

In This Issue