Functional assays in Drosophila facilitate classification of variants of uncertain significance associated with rare diseases
- Jung-Wan Mok1,2,5,
- Shelley B. Gibson1,2,3,5,
- Haley A. Dostalik1,2,3,5 and
- Shinya Yamamoto1,2,3,4
- 1Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA;
- 2Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, Texas 77030, USA;
- 3Genetics and Genomics Graduate Program, Baylor College of Medicine, Houston, Texas 77030, USA;
- 4Department of Neuroscience, Baylor College of Medicine, Houston, Texas 77030, USA
-
↵5 These authors contributed equally to this work.
Abstract
Individuals living with rare diseases often undergo a frustrating and expensive diagnostic odyssey. Clinical geneticists who analyze exome or genome sequencing data from rare disease patients often encounter a list of variants of uncertain significance (VUS) in known disease-causing genes or rare variants in genes of uncertain significance (GUS) that are difficult to interpret, even with the integration of the latest bioinformatic tools. In this Perspective, we review how studies using the fruit fly Drosophila melanogaster have facilitated rare disease diagnosis by uncovering the clinical relevance of GUS and classifying rare variants into specific allelic categories (loss-of-function or gain-of-function, Muller's morphs). We showcase how fly researchers have been collaboratively studying the loss-of-function of orthologous fly genes, assessing the ability of the human genes to rescue the fly mutant phenotypes, determining the effect of overexpressing human proteins, and testing functional consequences of rare variants of interest by generating analogous fly mutants to contribute to rare disease diagnosis. We argue that data obtained using Drosophila can be leveraged to design effective multiplexed assays for variant effects (MAVEs) to decipher the vast human variome.
The burden of undiagnosed diseases and the importance of functional studies
The United States classifies a disease as rare if it affects less than 200,000 people (Orphan Drug Act of 1983, Public Law 97-414, 96 Stat. 2049). Even with this stringent criterion, there are more than 10,000 rare diseases documented, and it is estimated that one in 10 people within the United States is affected by a rare disease (Haendel et al. 2020). The process to receive a diagnosis for people living with rare diseases, on average, takes 6 years (Benito-Lozano et al. 2022). This takes a toll on, not only the individual, but also their family, friends, and society (Graungaard and Skov 2007; EveryLife Foundation for Rare Diseases 2023). It is estimated that over 70% of rare diseases are caused by genetic variants (Wakap et al. 2020). With the improvements in accuracy, advances in speed, and reduction in the cost of DNA sequencing, it has become possible to rapidly identify most variants in the genome of a single individual (Gorzynski et al. 2022; http://www.genome.gov/sequencingcostsdata). Most clinical geneticists follow the American College Medical Genetics/Association for Molecular Pathology (ACMG/AMP) guidelines to classify genetic variants into five categories: pathogenic, likely pathogenic, benign, likely benign, and variant(s) of uncertain significance (VUS) (Richards et al. 2015). Whereas the first four classifications could be useful in interpreting rare variants that are found in genes that have been previously linked to human diseases, many individuals (including people without rare diseases) also carry many rare variants in gene(s) of uncertain significance (GUS) that cannot be classified into these categories due to the uncertainty at the gene level. Currently (as of August 2024), even though the Human Genome Organization Gene Nomenclature Committee (HGNC) has annotated 19,261 protein coding genes in the human genome (Seal et al. 2023), only 4923 genes have been associated with human phenotypes according to the Online Mendelian Inheritance in Man (OMIM) (Hamosh et al. 2021). In addition, there are 9103 genes that encode non-coding RNAs, the majority of which have yet to be linked to Mendelian disorders or traits. Therefore, most human genes are GUS, especially in the context of rare diseases, and many are part of the “ignorome,” which consists of functionally enigmatic genes on which no one has performed functional studies in any biological context (Pandey et al. 2014). One effective way to resolve VUS or GUS is to identify other individuals with similar genotypes and phenotypes. Matchmaking databases such as GeneMatcher (Hamosh et al. 2022) and PhenomeCentral (Osmond et al. 2022) that are interconnected through the Matchmaker Exchange (Boycott et al. 2022) platform can be used to facilitate this process. However, because current matchmaking efforts are quite labor-intensive and largely depend on the luck of matching cases being submitted to the same database or other nodes of Matchmaker Exchange, complementary approaches are required to fill the gap. Although a number of bioinformatic tools have been developed to predict the likely pathogenicity of genetic variants, there is still a lot of room for improvement (Ghosh et al. 2017).
Experimental studies are particularly useful to determine whether the variant of interest affects the function or expression of the gene of interest. For some genes, well-established functional assays can be utilized (e.g., enzymatic assay, transcriptional reporter assay, cell survival/proliferation assay). However, for most genes, there are no standardized tests that have been established and implemented. Recently, high-throughput approaches collectively referred to as multiplexed assays for variant effects (MAVEs) have been explored (Gasperini et al. 2016; Starita et al. 2017). In principle, MAVEs have the power to reveal the functional consequences of every possible genetic variant in a given gene for a specific molecular function that can be screened in vitro or in cultured cells. According to MaveDB (Esposito et al. 2019) and MaveRegistery (Kuang et al. 2021), MAVE data sets have been published for ∼600 human genes. To design an effective MAVE that can classify a VUS into pathogenic or benign, several data sets need to be collected. First, it is important to have some idea regarding the mechanism by which previously known pathogenic variants cause the disease. For example, if the disease of interest is caused by a loss-of-function (LOF) mechanism, the level of RNA or protein of interest can be used as a functional readout. However, if the disease is caused by a gain-of-function (GOF) mechanism, a readout that reflects the specific activity of the RNA/protein will be required. Second, a set of well-defined pathogenic and benign variants that can be used as positive and negative controls are necessary to calibrate the sensitivity and specificity of the assay. Whereas such information is often readily available for well-established disease-causing genes, it is very difficult to develop an effective MAVE for diseases that are ultra-rare and for GUS that have yet to be clearly linked to a clinical condition. In addition, although approaches that utilize specific cell types derived from human stem cells such as induced pluripotent stem cells (iPSC) or harness the power of organoids that mimic organ systems are being actively explored to establish MAVEs that reflect in vivo conditions, most MAVEs are performed in vitro or using limited cell lines. This makes the evaluation of the impact of a VUS on non-cell-autonomous or noncanonical functions very challenging, especially for diseases that are caused by defects in cell type–specific functions of genes.
Over the last decade, in vivo functional studies using genetic model organisms have made significant contributions to resolve VUS and to reveal the function of GUS that have been associated with rare diseases (Yamamoto et al. 2024). In this Perspective, we will introduce how the fruit fly, Drosophila melanogaster, can be used to characterize the functional consequences of novel variants that are potential causes of rare diseases and highlight a number of rare disease gene discovery papers in which fly geneticists have played a critical role. We will further discuss how this organism can be used to further classify pathogenic variants into different allelic categories to begin to reveal the underlying pathogenic mechanism of newly discovered clinical conditions. In this Perspective, we will guide clinical researchers to understand how experimental data obtained from flies can support the discovery and characterization of novel clinical conditions. In the Supplemental Text, we provide additional practical tips and technical insights to help support Drosophila scientists to begin to engage in these types of research projects. Such information can be further leveraged to design effective MAVEs to decipher the vast human “variome” (Cotton et al. 2008) to facilitate the diagnosis and therapeutic research on rare and ultra-rare diseases.
Approaches and strategies to study VUS and GUS in Drosophila
Research using fruit flies has played a fundamental role in establishing the field of genetics. In the early 1900s, pioneering studies performed by Thomas H. Morgan and his students demonstrated the principles of how genes are organized on chromosomes and how various genetic traits can be linked together using Drosophila. In the 1930s, Herman Muller, who was the founding president of the American Society of Human Genetics, discovered that genes can be mutated in different ways using Drosophila and coined specific terminologies (amorph, hypomorph, hypermorph, antimorph, neomorph, isomorph) to classify them, which are now known as “Muller's morphs.” The discovery of Hox genes and developmental signaling pathways by Ed Lewis, Christiane Nüsslein-Volhard, Eric Wieschaus, and many other fly researchers in the late twentieth century has increased the understanding of embryonic patterning in both flies and humans. Research led by Jules A. Hoffmann and colleagues identified conserved pathways that govern innate immunity, whereas research pioneered by Seymour Benzer indicated that complex traits such as behavior are also under the tight regulation of genetic factors. The sequencing of the euchromatic portion of the Drosophila melanogaster genome (Adams et al. 2000) accelerated the utilization of fruit flies in human disease studies because 75% of human disease genes that were known at the time were found to be conserved in the fly genome (Fortini et al. 2000; Reiter et al. 2001). More recently, the high degree of conservation of both genes and genetic pathways between flies and humans has been utilized for studies of VUS and GUS (Bellen and Yamamoto 2015). For example, Drosophila is one of the model organisms that has been selected to perform functional studies by the Rare Diseases Models and Mechanisms (RDMM) network in Canada, a matchmaking platform to connect clinicians and scientists with shared interests in specific genes related to rare diseases (Boycott et al. 2020). Similar initiatives have been launched in Europe (Ellwanger et al. 2024), Japan (Takahashi et al. 2022), Australia, and Singapore, which also integrate Drosophila scientists as key members of the network (Boycott et al. 2020). In the United States, the NIH supports the Model Organisms Screening Center (MOSC) of the Undiagnosed Diseases Network (UDN) and the Center for Precision Medicine Model (CPMM) at Baylor College of Medicine, both of which use Drosophila as a key model organism to study the function of GUS and effects of VUS that are clinically relevant (Baldridge et al. 2021). Nonprofit organizations such as the Simons Foundation have also funded fly researchers who can provide functional information for VUS found in individuals with autism spectrum disorders (Post et al. 2020; Marcogliese et al. 2022). Whereas other model organisms such as yeast (e.g., Saccharomyces cerevisiae), worms (Caenorhabditis elegans), zebrafish (Danio rerio), and mice (Mus musculus) have also been used to study GUS and VUS with unique advantages (Boycott et al. 2020; Yamamoto et al. 2024), the versatility of the technology and large number of publicly available reagents available to Drosophila researchers make this model organism unique.
In the following subsections, we discuss several factors that need to be taken into account when considering the use of Drosophila in VUS resolution and outline different technical strategies that could be implemented, providing specific examples to which each strategy has previously been applied. We have generated a list of over 130 publications between 2020 and early 2024 that have used Drosophila to assess the functions of VUS and GUS in vivo (Fig. 1; Supplemental Table 1). Although this is not meant to be an exhaustive list, we hope this provides further support that Drosophila has been utilized as an effective model system to support novel disease gene discovery, phenotype expansions, and diagnosis of rare diseases in a research setting.
Utilization of Drosophila in rare disease gene discovery and functional classification of VUS. To compile a list of papers published using Drosophila melanogaster as a model organism to study rare disease genes and variants between January 2020 and April 2024, we first performed manual inspections of all abstracts published in the following journals where many new human disease gene discovery and phenotypic expansion papers are published: American Journal of Human Genetics, European Journal of Human Genetics, Genetics in Medicine, Genome Research, Human Molecular Genetics, Nature Genetics. We also utilized the advanced search option on PubMed by performing the search with the keywords [“Drosophila” + “Variant”, “Human Mutation”, “de novo”, “biallelic”, or “missense allele”] in the title or abstract and manually filtered the results to identify relevant papers that were published in other journals. Note that this search strategy may have omitted some relevant papers that were not captured by the selected keywords. (A) The number of the papers that studied GUS and VUS utilizing Drosophila models has continuously increased between 2020 and 2023 based on our search methods. The majority of the studies explored the function of specific variants in addition to assessing the function of the orthologous fly gene. (B) The DIOPT scores of fly-human ortholog candidate pairs used in each study. A great majority of papers studied ortholog candidate pairs with a DIOPT score of 10 or more, but some papers conducted functional studies of genes with lower DIOPT scores. (C) The percentages of papers that used each Drosophila-based functional approach in human genetic studies. Approximately 70% of papers assessed the loss-of-function of the orthologous genes as part of the functional study of the gene of interest, which are often GUS. Approximately 30% of manuscripts performed humanization of orthologous fly gene, and ∼35% of papers performed functional assessment of the VUS based on overexpression and/or fly analogous variant studies. (D) Breakdown of functional studies of rare variants by the number of Drosophila approaches employed. Whereas approximately half of the papers used a single method (Loss-of-function, Overexpression, Humanization, or Fly analogous variant investigation), the other half integrated multiple strategies. (E) Out of 98 papers that assessed the function of variants, 54 papers classified the variants of interest as LOF and 34 papers designated the variants as GOF alleles. Some studies found both LOF and GOF variants that contribute to clinical phenotypes and one study reported variants that have properties of both GOF and LOF (referred to here as “mixed” due to their mixed properties). (F) Of the 43 papers that identified GOF variants, ∼75% provided data to further classify them as specific Muller's morphs. See Supplemental Table 1 for details.
Identifying orthologous genes between humans and flies
Although studies using Drosophila have proven to be effective and successful, there are many variables that must be considered when designing, conducting, and interpreting functional studies using this model organism (Bellen et al. 2019). The first thing to check is whether the Drosophila genome possesses an orthologous gene (or genes) to the human gene of interest (Fig. 2). Whereas homologous genes can be identified relatively easily using simple sequence alignment tools, orthologous genes between different species are much more difficult to determine because one needs to take multiple factors into consideration such as phylogeny, whole genome duplication events during evolution, and gene duplication/deletion events that happen before and after speciation (Kuzniar et al. 2008; Linard et al. 2021). Among the variety of methodological approaches, Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT) is widely used to identify ortholog candidates across multiple species, including flies and humans (see Supplemental Text 1; Hu et al. 2011). It is notable that 177 (83%) human genes among 212 that were studied in Drosophila over the past few years (Fig. 1B; Supplemental Table 1) have relatively high DIOPT scores (10 or more out of 19, DIOPT version 9.1). However, the DIOPT score alone should not be used as a sole conclusive method to judge whether the case could be successfully studied in Drosophila or not because there are some studies (∼16%) that successfully modeled specific VUS or supported new disease gene discovery using Drosophila with moderate to low DIOPT scores (1–9 out of 19). A few studies used Drosophila to evaluate the function of genes and variants that lack any predicted orthologs, which we will discuss further below. Often, one fly gene may correspond to multiple human genes due to two rounds of whole genome duplication events that likely happened in the vertebrate lineage (Holland 2003). In addition, there are scenarios in which a gene family has been amplified in the invertebrate lineage and cases in which multiple fly genes are orthologous to multiple human genes.
A workflow of utilizing Drosophila models for functional studies of GUS and VUS. An example of decision-making framework for employing Drosophila models in GUS and VUS interpretation. Four different strategies (L: Loss-of-function, H: Humanization, O: Overexpression, F: Fly analogous variant) can be selected depending on the characteristics of candidate genes and alleles. Multiple strategies could be utilized together to further support the findings.
Understanding the mutant phenotypes and expression pattern of orthologous fly genes
Before studying the function of a specific human genetic variant in vivo, it is useful to know what kind of phenotypes can be caused by manipulating the model organism gene. Amorphic (null) alleles that completely lack gene expression or function serve as an important reference point to interpret other types of alleles, such as hypomorphs (reduced activity or abundance), hypermorphs (increased activity or abundance), antimorphs (often referred to as “dominant negative” [Herskowitz 1987]), and neomorphs (gain of toxic/altered function or ectopic expression) (Muller 1932). In this Perspective, we group amorphic and hypomorphic alleles as loss-of-function alleles, and hypermorphic, antimorphic and neomorphic alleles as gain-of-function alleles. Although we acknowledge that there are some scientists who refer to antimorphic alleles as a type of LOF mutation (because the key function of the protein is often lost and their phenotype often resembles that of amorphic and strong hypomorphic alleles), the fact that these variants “gained a function to inhibit the endogenous copy of itself or paralogous genes” justifies the classification of these variants as a GOF allele.
Since the identification of the first mutant allele (amorphic allele of the white gene [Morgan 1910]), fly researchers have generated a great number of mutant strains (see Supplemental Text 2). Although it can be highly encouraging if the symptoms seen in patients have links to previously identified fly mutant phenotypes (i.e., in this case, mutant flies can be considered as a “disease model”), one should not be discouraged from pursuing functional studies of GUS and VUS in this model organism even if the previously reported phenotypes appear unrelated. Indeed, some clinical phenotypes are very difficult or impossible to model in flies as insects lack certain organ systems (e.g., endoskeleton, adaptive immune system), have significantly different anatomical features (e.g., cardiovascular system, nervous system), and do not seem to carry out sophisticated functions (e.g., higher cognition). Even when studying clinical cases that relate to these conditions, various phenotypes in flies can be used as readouts of gene and variant function. For example, bone morphogenetic protein (BMP) signaling is utilized in diverse developmental contexts in flies although this model organism lacks bones. In some cases, nonobvious phenotypic links based on shared molecular pathways may be present between orthologous genes (i.e., phenologs) (McGary et al. 2010). In other cases, flies can be considered as “living test tubes” that can provide qualitative or quantitative readouts of variant function. It is, of course, important not to overinterpret the data obtained in Drosophila because some genes may play species-specific roles. Also, when conducting an experiment in flies (and in other species), it is also essential to attempt to validate an observation reported in the literature because some information annotated in model organisms databases such as FlyBase is based on high-throughput screens, which may reflect nonspecific phenotypes that were caused by genetic background or other technical issues such as experimental artifacts.
When and where the gene of interest is expressed is also an important piece of information that should be taken into consideration when using model organisms for functional studies. Although gene expression data alone do not provide functional information, one can utilize this information to design probing experiments, especially when cell type– or tissue-specific functional information is needed. FlyBase integrates gene and protein expression information for individual genes from various sources (see Supplemental Text 2). Such data could be compared with gene expression data in human (e.g., GTEx [The GTEx Consortium 2013], CZ CELLxGENE [CZI Cell Science Program et al. 2025]), other vertebrate model systems (e.g., mouse, rat, zebrafish), as well as phenotypic data from rare disease patients to generate hypotheses regarding molecular mechanisms of pathogenicity. In some cases, the expression level of the fly gene of interest is below the detection level of single-cell RNA sequencing technology. T2A-GAL4 and Kozak-GAL4 lines are genetic tools that have been developed over the last decade that can simultaneously provide both expression and phenotypic data for a given gene of interest with high specificity and sensitively (see Supplemental Text 3; Lee et al. 2018; Kanca et al. 2022). In addition, these reagents can be used to “humanize” Drosophila genes, which will be discussed in the following sections.
Identifying similarities between patient and fly mutant phenotypes
When the human gene of interest is a GUS, variants that are highly likely to function as LOF alleles (e.g., nonsense, frameshift, variants that disrupt canonical splice sites, deletions) may be flagged as VUS, especially when there is limited to no information about the function of the gene of interest in vertebrate systems. LOF human variants can often be studied by investigating the effect of LOF of the orthologous fly gene (Fig. 3A). If similarities between phenotypes of Drosophila mutants and rare disease patients can be identified, such data can be utilized to provide supporting data on pathogenicity (Fig. 2). In addition, this can lead to deeper understanding of the human gene's function and potential disease mechanisms in an in vivo model. This approach seems to be the most common strategy used in recent publications (72% of papers) (Fig. 1C), likely due to the large amount of publicly accessible fly reagents that are LOF alleles.
Overview of methods used to study GUS and VUS in Drosophila. Multiple complementary techniques can be used to study the function of candidate genes and variants identified as a novel disease candidate in patients. (A) Loss-of-function (LOF) studies look at the consequences of complete loss or reduction of the fly ortholog's function. This can happen through (1) mutant alleles that limit/eliminate protein function, (2) T2A-GAL4 cassettes inserted into the first intron, causing only a short portion of the protein to be made that is likely LOF, or Kozak-GAL4 cassettes that replace the coding sequence of the gene and either knock down or knock out the gene, and (3) transgenic RNAi lines to knock down the endogenous transcripts. (B) Humanization studies replace the expression of the fly ortholog with a human ortholog. This can either be accomplished by (1) combining a fly ortholog LOF strategy with cDNA expression, or (2) using CRISPR to knock the human cDNA with the variant into the endogenous fly locus. (C) Overexpression assays express the human/fly cDNA ubiquitously or in a tissue of interest using a GAL4 line, without manipulating the endogenous fly ortholog of the human gene of interest. (D) When the variant of interest affects a conserved amino acid, the fly analogous variant approach can be utilized. This can be accomplished by (1) combining a fly ortholog LOF strategy with fly cDNA expression, (2) generating a knock-in allele by directly targeting the endogenous fly ortholog gene based on genome editing, or (3) overexpressing the fly cDNA ubiquitously or tissue-specifically using various GAL4 lines.
Currently, over 62,000 fly lines covering 12,500+ genes that are designed to function as LOF alleles (e.g., mutant alleles including lines from forward genetic screens, T2A-GAL4 and Kozak-GAL4 lines, deletions, RNAi lines, and gRNA lines) are readily available through public fly stock centers, making LOF studies more easily accessible than strategies that require new reagents (see Supplemental Text 4). For example, one study investigated the role of the microRNA processor encoded by DROSHA using previously generated fly stocks with EMS (ethyl methanesulfonate)-induced LOF mutations (Smibert et al. 2011; Pressman et al. 2012; Luhur et al. 2014) of its fly ortholog drosha (Barish et al. 2022). The small brain phenotype that was documented in the fly drosha mutants provided in vivo information to support the hypothesis that the microcephaly phenotype found in patients with rare genetic variants in DROSHA was likely caused by LOF of this gene. In another study, a homozygous mutant allele of fly Sodh2 (ortholog of human SORD) generated in a large-scale gene disruption project (Bellen et al. 2011) showed loss of synaptic terminals in the visual system and age-dependent worsening of locomotor activities accompanied by increased sorbitol levels (Cortese et al. 2020). This recapitulated some of the key phenotypes seen in patients with LOF variants in SORD, showing that this LOF fly reagent can be used as a model to study the disease mechanism and molecular pathogenesis. Indeed, this study identified aldose reductase inhibitors (e.g., Epalrestat, Ranirestat) as a potential therapeutic approach because fly mutant phenotypes were significantly improved by drug treatment. In more recent years, researchers have been using CRISPR-based approaches to proactively generate strong LOF alleles in genes that have not yet been well characterized in flies. Deletion alleles generated using CRISPR have been used in studies to support the pathogenicity of LOF function variants in multiple GUS (Kim et al. 2020; Wormser et al. 2021; Mattioli et al. 2023). T2A-GAL4 (Lee et al. 2018) and Kozak-GAL4 (Kanca et al. 2022) lines mentioned above have also been used as LOF alleles to provide supporting evidence for novel human disease discovery papers (Fig. 3A; Supplemental Table 1).
It is important to note that there are some drawbacks of studying strong LOF alleles of fly genes in the context of VUS and GUS research. For example, ∼30% of fly genes are estimated to be essential for life, and strong LOF alleles may not allow the study of tissue-specific functions of genes and variants that are clinically relevant. These limitations can be overcome by techniques that allow investigation of tissue- or cell type–specific function of genes such as RNAi (RNA interference) (Fig. 3A; see Supplemental Text 5). Several large collections of transgenic fly lines that allow the conditional expression of long double-stranded RNA or short hairpin sequences based on the UAS/GAL4 system have been made publicly available from Bloomington Drosophila Stock Center, Vienna Drosophila Resource Center, and the National Institute of Genetics in Japan. When crossed to GAL4 lines that are expressed in specific cell populations, one can reveal cell type– or tissue-specific function of essential genes that may have been masked by lethality. For example, neuron-specific knockdown of fly Isha (ortholog of human SCAF4 as well as SCAF8) caused phenotypes affecting locomotor function, learning, and short-term memory, which was accompanied by structural defects in synapse development (Fliedner et al. 2020). This finding helped to establish SCAF4 as a novel neurodevelopmental disease gene in humans.
Studying the functional impact of coding variants based on “humanizing” fly genes
Whereas studying phenotypes of existing or new LOF fly mutants can help us understand the function of GUS and mechanisms underlying diseases linked to human LOF variants, the main limitation of this approach is that it can only provide information at the gene level. The functional impact of missense alleles and late-truncation alleles that escape nonsense-mediated decay are much more difficult to predict compared to early-truncations, canonical splicing variants, and deletion alleles. “Humanization” of an orthologous fly gene is a powerful approach to providing functional information for these types of variants (Fig. 3B), which have been used in 29% of recent publications (Fig. 1C). This can be achieved by rescuing the fly mutant phenotype with reference or variant human protein (often introduced as a transgene that carries a human cDNA) and comparing their effectiveness. When a reference human protein can rescue a fly mutant phenotype, one can say that the molecular function of the human and fly genes is conserved. In addition, VUS that affect amino acids which are not conserved between flies and humans can be studied using this approach.
Humanization of a fly gene can be conducted based on a rescue paradigm that combines T2A-GAL4 alleles with UAS-human cDNA lines. Using this strategy, the human protein of interest can be expressed in the same spatiotemporal expression pattern as the fly protein in a LOF mutant background. If the reference and variant UAS-human cDNA transgenes are integrated into the identical genomic locus using a site-specific transgenesis system based on phiC31 (Venken et al. 2006), one can assess whether the reference human protein can rescue the fly mutant phenotype and further assess how the variant human proteins perform in comparison (Fig. 4A; Pineda et al. 2021; Pan et al. 2023). Although the rescue of the fly mutant phenotype may not always be complete, valuable information can be obtained as long as one can observe a functional difference between the reference and the variant of interest. In one study, reference human TIAM1 was shown to be able to partially rescue the semilethality caused by loss of fly sif (ortholog of human TIAM1 as well as TIAM2). In contrast, a missense variant (p.R23C) in this gene found in monozygotic twins with developmental delay, intellectual disability, and seizures was not able to rescue this defect, leading to the conclusion that this VUS is a strong LOF allele (Lu et al. 2022). Kozak-GAL4 lines can also be combined with UAS-human cDNA lines to achieve a similar goal. For example, reference human TOMM70 cDNA was able to rescue the lethality in homozygous Kozak-GAL4 mutants of fly Tom70 (orthologous to human TOMM70), whereas the two rare variants (p.T607I and p.I554F) found in patients with severe neurological symptoms were classified as partial LOF alleles because they rescue less efficiently. This study also performed rescue of eye morphology and synaptic transmission phenotypes caused by eye-specific knockdown of Tom70 (GMR-GAL4, UAS-Tom70-RNAi) using reference or variant UAS-TOMM7 transgenes. The two variants again showed less potency in their ability to rescue these defects compared to the reference allele, indicating these variants are likely to be hypomorphs (Dutta et al. 2020). Humanization of fly genes can be achieved by combining other genetic methodologies (see Supplemental Text 6), making this a powerful methodology to study GUS and VUS.
Functional classification of VUS based on examining the phenotypic difference between the reference and variant alleles in flies. Both rescue and overexpression-based approaches can utilize various phenotypes such as lethality, fertility, lifespan, morphological defects, or behavioral phenotypes as functional readouts of gene function. (A) Rescue-based approaches to assess patient variants require a scorable phenotype caused by LOF of the fly ortholog that is rescued by expression of the reference allele. If the VUS is a LOF, expression of the variant would be unable to rescue to the same level as the reference. (B) Overexpression strategies can be used to classify the variant of interest based on comparing the phenotypes induced by each transgene. LOF alleles are determined by a phenotype that is less severe than the reference phenotype, indicated by “>”, whereas GOF alleles are determined by a phenotype that is more severe than the reference phenotype, indicated by “<”.
For cases in which the human reference does not rescue loss of fly gene function, there may still be useful functional data that can be retrieved when comparing the phenotypes caused by introducing the reference or variant human transgenes into a fly mutant background. For example, two missense variants in MRTFB (p.R104G and p.A91P) that were found in patients with a novel neurodevelopmental disorder were initially assessed using a rescue-based paradigm (Andrews et al. 2023). Whereas both reference and variant human MRTFB cDNA expression failed to rescue a bristle morphology phenotype caused by hypomorphic alleles of fly Mrtf (ortholog of MRTFB as well as MRTFA and MYOCD), expression of the variants induced a more severe phenotype causing lethality, which was not observed with the reference allele. Based on this and other data, these missense variants were classified as GOF (hypermorphic) alleles.
Studying the functional impact of coding variants based on overexpression of human proteins
Based on our and others’ experiences (Marcogliese et al. 2022; Yamamoto et al. 2024), 40%–60% of human genes are likely to fully or partially rescue a fly ortholog mutant phenotype based on rescue experiments using T2A-GAL4/Kozak-GAL4 lines and UAS-human cDNA transgenes. One reason for failure may be that human proteins cannot function in the context of the fly cell. Another reason could be that heterologous expression of reference human proteins in a fly cell could induce a phenotype through GOF mechanisms (hypermorph, antimorph, neomorph), especially for dosage-sensitive genes, because the level of protein expression achieved using the GAL4/UAS system tends to be higher than the endogenous level. In the latter case, one could take advantage of any scorable phenotypes induced by ectopic human protein expression and use them as readouts of protein function (Fig. 3C; Her et al. 2024). This approach has been used in 35% of publications (Fig. 1C) and is also often used in combination with other strategies (Fig. 1D; Supplemental Table 1). In one scenario, if ubiquitous overexpression of the human reference protein causes lethality, but the overexpression of the variant protein produces viable flies, this variant can be classified as a LOF allele (Fig. 4B). In another scenario, if neuronal overexpression of the variant protein causes a behavioral phenotype that is not observed when the reference protein is expressed, this could be considered as some type of a GOF allele (Fig. 4B). Whereas additional experiments are required to further subclassify LOF variants into amorphic or hypomorphic alleles and the GOF variants into hypermorphic, antimorphic, or neomorphic alleles, overexpression experiments can be integrated into a rapid screening pipeline.
It is important to reemphasize that phenotypes that are screened in fly models do not necessarily have to correlate well to patient symptoms. For example, morphological defects observed when human proteins are overexpressed in the fly eye or wing are often used as a phenotypic readout because development of these tissues is sensitive to alterations in fundamental cellular and developmental pathways (Bier 2005). An overexpression-based approach has been useful in characterizing the functional difference between two missense variants (p.D413G and p.S1522L) in ROBO1 that have been found to cause different neurodevelopmental phenotypes in patients with different inheritance patterns using a T2A-GAL4 allele of fly robo1 (ROBO1 ortholog) as a driver and screening for lethality as well as neuroanatomical defects (Huang et al. 2022a). Through this experiment, the variant linked to a recessive disorder was classified as a hypomorphic LOF allele whereas the variant linked to the dominant disorder was identified to be a neomorphic GOF allele. Hypermorphic and antimorphic variants can also be revealed using overexpression approaches, as in the case of MTSS2 (Huang et al. 2022b).
Once a variant is determined to have a functional consequence, additional experiments can be designed to probe into the disease mechanism (Sandberg et al. 2020; Wilson et al. 2020; Ganguly et al. 2021; Marmion et al. 2021; Gignac et al. 2023; Srivastava et al. 2023). Follow-up biochemical or cell biological studies using tagged-transgenes or specific antibodies can be used to further determine the functional consequence of VUS on protein expression or localization. For example, immunohistological studies of DXH9 identified that two disease-associated variants that were classified as hypomorphic alleles based on overexpression assays show different subcellular localization patterns compared to the reference allele (Yamada et al. 2023). Reporter assays and genetic interaction studies can also be integrated to determine the functional impact of specific variants in the pathway of interest. One example is an experiment that was performed to determine how different variants in PTEN affect the PI3K pathway using wing sizes of flies as a readout of tissue growth (Ganguly et al. 2021). Overexpression studies can also provide a platform to test potential personalized drug treatments that are translatable to rare disease patients (Chung et al. 2020; Bakkar et al. 2021). An example of this is the pharmacological rescue of GABA transport deficiency seen with overexpression of the p.A288V variant in GAT1 (Kasture et al. 2023). This study provided valuable in vivo functional data on drug treatments that could improve the individual's symptoms and set up assays that can be used to assess drug-based rescue of additional GAT1 variants in the future.
It is important to note that not all functions of a protein of interest can be revealed by a single assay. For example, functional assessment of 18 different known pathogenic variants in KDM6B linked to neurodevelopmental diseases was performed using ubiquitous and wing-specific overexpression of the human protein. The authors concluded that the phenotypic assay they selected can be used to determine the function of the protein carried out by the C-domain of the protein but not for variants in the N-domain (Rots et al. 2023). Therefore, whereas positive data can be used as strong supporting evidence that the variant has functional consequences supporting pathogenicity, one should not categorize a variant as “benign” based on negative data.
Overexpression experiments can also provide functional information on variants in human genes that are not conserved in Drosophila (Fig. 2). For example, orthologous genes that correspond to SNCA (Parkinson's disease), PRNP (prion diseases), and MECP2 (Rett syndrome) are not found in the Drosophila genome. However, ectopic overexpression of these human proteins can cause scorable phenotypes in flies (Feany and Bender 2000; Cukier et al. 2008; Myers et al. 2022). In addition, some human genes that are not considered to be conserved in flies based on bioinformatic algorithms can be found to rescue LOF mutant phenotypes in related fly genes, as reported for human APOE (rescuing the fly Glaz mutant) and LEP (rescuing the fly upd2 mutant) (Rajan and Perrimon 2012; Liu et al. 2017). Hence, if a phenotypic difference can be seen upon reference and variant protein overexpression and/or through rescue experiments of functionally related genes, one can study VUS in nonconserved genes. These experiments further provide an entry point to further study the underlying mechanism of these disorders, even when the obvious ortholog of the human gene of interest is not found in the fly genome.
Studies of VUS in human genes through analogous variant modeling in the fly ortholog
For cases in which the variant of interest affects an amino acid that is conserved between humans and flies, studies of the analogous (equivalent/homologous) variant in the context of the fly protein can provide useful information (Figs. 2, 3D). This approach is particularly useful when the human protein fails to function in the context of a fly cell or when human cDNA is not publicly or commercially available. Such an analogous fly variant strategy has been utilized in 35% of the papers we analyzed, suggesting its versatility (Fig. 1C; Supplemental Table 1). Ideally, the amino acid of interest should be identical; however, modeling is still feasible when amino acids differ, provided that their biochemical characteristics are similar. For instance, a recent study employed a fly analogous variant strategy to model a VUS in NKX2.5 (Lovato et al. 2023). In this study, the authors studied the impact of p.K158N by introducing a corresponding p.R321N mutation in the fly ortholog tin (NKX2.5 ortholog), leveraging the fact that both lysine (K) and arginine (R) are basic residues with similar biochemical properties. This approach underscores the potential for cross-species genetic modeling, even when exact amino acid matches are not present.
The most straightforward approach to studying the human variant using this analogous variant strategy is to generate a knock-in allele by directly targeting the fly ortholog gene locus using gene editing technology such as CRISPR-Cas9. This approach is particularly beneficial when: (1) the gene is dosage-sensitive, so maintaining or minimally affecting the endogenous gene expression level is crucial; (2) no available mutant flies or efficient RNAi lines exist; or (3) whole-organism studies are preferred over tissue-specific experiments. In a recent study on missense variants in CYFIP1 found in individuals with a neurodevelopmental condition, the authors generated two different analogous mutant alleles (p.I471V and p.P760L) in fly Cyfip (ortholog of CYFIP1 as well as CYFIP2), corresponding to two missense variants in CYFIP1 (p.I476V and p.P742L) that were found in a trans-heterozygous state using CRISPR-based scarless gene editing (Mariano et al. 2024). The authors found that flies with biallelic mutations mimicking the individual's genotype (Cyfipp.I471V/Cyfipp.P760L) recapitulated the actin polymerization defect identified in patient-derived cells. Biallelic mutant flies also showed defective neuronal morphology and behavioral defects, suggesting that the biallelic variants in CYFIP could contribute to the clinical phenotype in the probands and that the two missense variants are likely pathogenic. Additional strategies to study VUS in the context of the fly gene/protein are discussed in Supplemental Text 7.
Using Drosophila to further classify pathogenic variants based on functional consequences and to discover complex properties of some alleles
As we have seen, different genetic strategies in Drosophila can be used to not only determine that the variant of interest has significant functional consequences but also to provide information regarding its directionality. Understanding whether a variant is a LOF or GOF allele can have direct clinical implications because therapeutic strategies should be tailored to the type of variant an individual carries. For example, if a patient carries a pathogenic LOF variant, strategies to reintroduce or increase the expression of the gene of interest, for example, through gene therapy, could be considered. In contrast, approaches to reduce or abolish the function of the pathogenic allele using antisense oligonucleotides (ASO) or CRISPR-mediated genome editing would be preferred if the pathogenic variant is a GOF allele. Classifying variants into specific Muller's morphs can further help scientists begin to develop and test specific hypotheses regarding the molecular mechanisms of pathogenesis and assist clinicians to further refine therapeutic approaches. As an example, if the patient is heterozygous for an antimorphic allele, therapies to increase the expression of the wild-type copy of this gene could have a beneficial effect. However, such manipulation is likely to be deleterious for individuals who carry hypermorphic alleles and would likely have no beneficial effect for individuals who possess a neomorphic allele. Although some bioinformatic tools are being developed to predict whether a variant may be a LOF or GOF allele (Stein et al. 2023), these tools need further development. Additionally, none of the existing bioinformatics programs have the capacity to assign pathogenic variants into different Muller's morphs; therefore, experimental data are critical for such classification. Of 134 recent papers that have used Drosophila for functional studies of VUS and GUS, 95 papers have annotated the variants of interest as LOF or GOF (Fig. 1E; Supplemental Table 1). Furthermore, 32 of the 43 papers that designated the variants of interest as GOF alleles further provided data to classify these variants into specific Muller's morphs (hypermorph, antimorph, neomorph) (Fig. 1F; Supplemental Table 1), underscoring the value of in vivo functional studies using Drosophila.
Genes and proteins are often pleiotropic and can have more than one molecular function. Also, a single variant can have different functional consequences depending on the context, because proteins can be engaged in different molecular interactions in different cell types, organ systems, or developmental stages. There have been several studies in Drosophila that revealed such unique properties of a handful of variants, and these intriguing biological insights have been obtained based on studying the functional consequences of variants in different contexts. In one study, several variants in DVL1 associated with Robinow syndrome were found to cause both a LOF of canonical Wnt signaling and a GOF of noncanonical (JNK/PCP [c-Jun N-terminal Kinase/Planar Cell Polarity]) Wnt signaling by assessing multiple parameters that can be measured in the fly wing (Gignac et al. 2023). In another study, two de novo missense variants found in patients with autism spectrum disorders found in EPHB1 and MAP4K1 had different functional consequences in different tissues. The EPHB1 (p.V916M) variant behaved as a LOF allele when the overexpression-based functional assay was performed in the fly eye but behaved as a GOF allele in the fly wing. The MAP4K1 (p.T725T) variant had the opposite functional consequence: GOF when tested in the eye and LOF when assessed in the wing (Marcogliese et al. 2022). A similar tissue-specific LOF and GOF effect was found for a missense variant in TNPO2 (p.W727C) that was reported in a new disease gene discovery paper (Goodman et al. 2021) and a missense variant in AXIN2 (p.E66K) linked to phenotypic expansion (Aceves-Ewing et al. 2024). Whereas additional studies are required to clarify which functional effect is most relevant to each clinical phenotype of interest and the molecular mechanisms behind these peculiar behaviors need to be investigated, it is possible that a subset of pathogenic variants may cause different functional effects in different organ systems or cell types. In these cases, therapeutic approaches should be further customized because an approach to correct GOF or LOF aspects of the disorder may have unexpected side effects that may impact a certain organ system.
Collaborations between fly researchers and MAVE developers can facilitate the deciphering of the human variome
Clinical geneticists who analyze exome or genome sequencing data in diagnostic laboratories or research settings often encounter a number of VUS or rare variants in GUS. Whereas these individuals can provide high-quality sequence information and variant interpretation based on bioinformatic analysis (e.g., known pathogenic and benign variant databases, population genomic data, in silico prediction algorithms, experiment-based evidence based on previous publication), they often do not have the capacity to further resolve a VUS in a laboratory setting. Whereas “functional assay” is one of the key criteria that clinicians use while classifying the likely pathogenicity of a variant based on the ACMG/AMP sequence interpretation guideline (PS3 and NS3 codes) (Richards et al. 2015), clinical diagnostic laboratories do not carry out these types of studies. For genes in which the functionality of proteins with relevance to disease mechanism can be quantified, MAVE assays based on cell models or recombinant proteins could be established by basic scientists to assess the functional consequences of all possible variants in vitro (which need to be interpreted with caution [Gelman et al. 2019]). In addition, variants in noncoding (or coding) regions of the gene that may affect transcription and splicing could be studied in large scale based on high-throughput reporter assays using cell lines or transcriptomic assays based on patient-derived materials (Walker et al. 2023). For the clinical genetics field to obtain a well-defined set of MAVE data that could be used in a clinical setting to classify the VUS as pathogenic or benign, model organism researchers can provide valuable in vivo data to fill the knowledge gap that still exists for many clinically relevant genes. For example, ACMG/AMP recommends that more than 10 clearly defined pathogenic and benign variants should be used to determine the sensitivity and specificity of MAVE-type assays (Brnich et al. 2020). Because many ultrarare diseases lack well-defined pathogenic alleles and disease mechanisms, basic researchers with expertise on homologous genes, including Drosophila biologists, could contribute to establishing these foundations. Especially for genes that encode multifunctional proteins or those that function in non-cell-autonomous manners, data obtained from in vivo model systems would be extremely valuable to determine what in vitro (e.g., specific protein-protein interactions that are relevant to pathogenesis) or cell-based (e.g., co-culture system or organoid based) assay could be used to establish MAVEs that are most relevant to the clinical cases/phenotypes of interest.
Basic scientists who can provide valuable insights or are willing to perform functional assays can be identified through literature searches or matchmaking tools such as ModelMatcher (Harnish et al. 2022). Because the name of orthologous genes often differ in different species, and valuable data are scattered across diverse databases on the internet, integrated knowledge bases that combine information from multiple species and sources such as MARRVEL (Wang et al. 2017, 2019), Alliance of Genome Research (Alliance of Genome Research Consortium 2020), Monarch Initiative (Putman et al. 2024), and Gene2Function (Hu et al. 2017) could be used to facilitate this process. Some model organism geneticists have unique technical capability to efficiently generate mutant and transgenic animals for GUS that have never been studied in any organism and to further perform cursory functional assays in moderate scales (e.g., UDN MOSC, CPMM) (Baldridge et al. 2021). These scientists could establish a functional study pipeline for many genes to resolve VUS in a mid-throughput manner, potentially in collaboration with a group of clinical researchers, patient organizations, and/or biotech companies that have strong motivations to facilitate research on specific genes towards their ultimate goal of establishing a treatment and cure.
In summary, Drosophila has been a powerful research tool to reveal the function of GUS and VUS that are relevant to rare genetic disorders. Scientists who have expertise in using this model organism to answer diverse biological questions have the potential to address an unmet need: resolve VUS found in patients who are still undiagnosed by providing functional data that permit the classification of these variants into LOF or GOF alleles and specific Muller's morphs. If a VUS can be proven to be pathogenic, this will end the patient's diagnostic odyssey, helping them financially by reducing the constant medical tests and mentally by providing answers. Importantly, in cases in which there are approved treatments available, a definitive molecular diagnosis backed up by solid functional data can allow them to receive authorization and can help insurance cover the cost of drugs. For cases in which there are potential treatments that are in clinical trials, a solid molecular diagnosis could make them eligible to participate in such studies. For diseases that do not have any drugs that are in development, biological information obtained from model organisms including flies can be leveraged to identify potential therapeutic avenues. Hence, Drosophila research can have a true impact on clinical care, and more stakeholders, especially translational scientists who are developing MAVE-type assays, could benefit by becoming aware of its potential. In addition, considering that studies of rare diseases can often reveal pathogenic mechanisms that underlie more common diseases (Yamamoto et al. 2024) and the Drosophila strategies discussed here can be applied to understand the functional consequences of variants associated with a variety of disorders (Bellen and Yamamoto 2015), collaborative initiatives that include fly geneticists can also facilitate research on common diseases.
Competing interest statement
The authors declare no competing interests.
Acknowledgments
We thank Drs. Dustin Baldridge, Hugo J. Bellen, Oguz Kanca, and Michael F. Wangler for valuable discussions. We also thank all individuals with rare diseases and their family members who participated in clinical research cited in this Perspective. J-W.M. has been supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1A6A3A14044510). S.B.G. has been supported by The Cullen Foundation. H.A.D. has been supported by the National Research Service Award (NRSA) training grant T32GM139534 from the National Institutes of Health (NIH, National Institute of General Medical Sciences) and The Cullen Foundation. S.Y. has been supported by research grants from the NIH (Office of Research Infrastructure Programs: R24OD022005, U54OD030165; National Institute on Aging: RF1AG071557, R01AG071557; National Institute of Neurological Disorders and Stroke: U2CNS132415; National Human Genome Research Institute: R01HG011795), the Chan Zuckerberg Initiative (#2023-32824, #2023-332162), and institutional funds from Baylor College of Medicine and Texas Children's Hospital.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278291.123.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.















