Hydra has mammal-like mutation rates facilitating fast adaptation despite its nonaging phenotype

  1. Steve Hoffmann2,8
  1. 1Computational Phenomics group, IUF—Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany;
  2. 2Computational Biology Group, Leibniz Institute on Aging—Fritz Lipmann Institute (FLI), 07745 Jena, Germany;
  3. 3Computational Phenomics group, Faculty of Biology and Biotechnology, Ruhr University Bochum, 44801 Bochum, Germany;
  4. 4Core Facility Next Generation Sequencing, Leibniz Institute on Aging—Fritz Lipmann Institute, 07745 Jena, Germany;
  5. 5Institute of Medical Systems Biology, Ulm University, 89081 Ulm, Germany;
  6. 6Molecular Genetics Lab, Leibniz Institute on Aging—Fritz Lipmann Institute (FLI), 07745 Jena, Germany;
  7. 7Institute of Biochemistry and Biophysics, Friedrich Schiller University Jena, 07745 Jena, Germany
  • Corresponding author: arne.sahm{at}rub.de
  • Abstract

    Growing evidence suggests that somatic mutations may be a major cause of the aging process. However, it remains to be tested whether the predictions of the theory also apply to species with longer life spans than humans. Hydra is a genus of freshwater polyps with remarkable regeneration abilities and a potentially unlimited life span under laboratory conditions. By genome sequencing of single cells and whole animals, we found that the mutation rates in Hydra’s stem cells are even slightly higher than in humans or mice. A potential explanation for this deviation from the prediction of the theory may lie in the adaptability offered by a higher mutation rate, as we were able to show that the genome of the widely studied Hydra magnipapillata strain 105 has undergone a process of strong positive selection since the strain's cultivation 50 years ago. This most likely represents a rapid adaptation to the drastically altered environmental conditions associated with the transition from the wild to laboratory conditions. Processes under positive selection in captive animals include pathways associated with Hydra’s simple nervous system, its nucleic acid metabolic process, cell migration, and hydrolase activity.

    Hydra is a genus of small freshwater polyps with a predicted laboratory life span of 1400 years for Hydra magnipapillata (Jones et al. 2014). Thus, it is often referred to as “nonsenescent” (Dańnko et al. 2015; Schaible et al. 2015; Klimovich et al. 2018), “nonaging” (Sturm et al. 2017; Holtze et al. 2021), or “immortal” (Müller 1996; Boehm et al. 2012; Domazet-Lošo et al. 2014). This controversial attribution is based on the observation that, in stark contrast to most other animals, all Hydra species investigated under laboratory conditions avoid senescence while reproducing asexually (Martínez 1998; Schaible et al. 2014). This extraordinary longevity seems to be linked to the simplicity of Hydra’s body plan, the homeostatic somatic maintenance of the adult polyp by replacing all cells approximately every 20 days (Campbell 1967), and its high regenerative capacity. The animals can replace any part of their body when dissected into multiple fragments. Each part usually regrows into a new organism within 2–3 days (Fujisawa 2003; Tomczyk et al. 2015), as demonstrated in the most extreme case from a fragment of only 1% of the body mass (Shimizu et al. 1993). This ability to completely rebuild all tissues without leaving scars or evidencing any restrictions is the subject of intensive ongoing research, especially with regard to how wounding, injury signaling, and pattern formation are molecularly linked in this genus. Consequently, Hydra is, in addition to being a powerful model for aging studies (Tomczyk et al. 2015), a premier organism for the study of stem cells and regeneration in general as well as for regenerative medicine (Michael 2019; Vogg et al. 2019b; Holtze et al. 2021).

    Contributing factors to these abilities might be the simplicity of Hydra’s body plan and the fact that most of its cells are continually proliferating stem cells (Holstein and David 1990). Both faster cycling interstitial cells (I cells) and slower dividing ectodermal and endodermal epithelial cells (E cells) have been described as main stem cell lineages in Hydra. While I cells are multipotent and differentiate to neurons, nematocytes, gland cells, and gametes, E cells are unipotent, forming either the ecto- or the endoderm of the body column and the extremities. These lineages cannot replace one another, but combined they can differentiate into all Hydra cell types. They renew all the cells of an individual through continuous replacement within 3 weeks (Bosch 2009; Hemmrich et al. 2012; Domazet-Lošo et al. 2014; Siebert et al. 2019). Closely linked to this constant renewal is Hydra’s ability to reproduce asexually by budding. During this process, a bud containing one-fifth of the ∼50,000–100,000 cells of an adult individual is formed and detached (Dańko et al. 2015; Tomczyk et al. 2015). Hydra reproduces asexually in ∼19°C water under nutrient-rich conditions, which are usually fulfilled when kept in captivity. Under less favorable conditions, however, Hydra regularly switches to sexual reproduction (Martínez and Bridge 2012; Kaliszewicz and Lipińska 2013).

    All Hydra individuals that originate vegetatively by budding from a single sexually originated ancestor are called a clonal colony or “genet.” In the case of sexual reproduction, soon after fertilization Hydra embryonal cells differentiate into three specific stem cell lineages, including I cells and E cells, that will behave and evolve as separate cell lineages within the genet. Mutations in these lineages that occur during vegetative reproduction can be regarded with respect to the genet as “somatic”—in analogy to mutations occurring in the somatic cells of a multicellular organism with dedicated reproductive cells. Similarly, mutations that have occurred in I-cell lineages and later transform into gametes that give rise to new genets resemble germline mutations in higher organisms. For convenience and simplicity, we denote mutations dating back to before the last sexual reproduction event as “germline” with respect to the genet. Of course, it is important to note that this simplified nomenclature is relative, as “somatic” mutations in a genet may become “germline” for subsequent ones. Nevertheless, for a given genet Hydra, “germline” variations are strictly biallelic and are present in all cells of the genet—unless a locus is affected by a second mutation and the resulting allele is transmitted to a subset of cells.

    In Hydra oligactis, lowering the water temperature can induce aging. Gene expression comparisons between aging and nonaging Hydra have highlighted the role of E cells and processes such as apoptosis and autophagy for this nonsenescent phenotype (Sun et al. 2020; Tomczyk et al. 2020). A number of fundamental studies and resources have been published in recent years that examine Hydra from a molecular perspective. These include a low-coverage and therefore quasi-monoallelic H. magnipapillata strain 105 genome reference (Chapman et al. 2010) and transcriptome sequences (Wenger and Galliot 2013), inducible methods for gene knockdown (Watanabe et al. 2014; Vogg et al. 2019a), topological analysis of morphogenesis (Maroudas-Sacks et al. 2021), as well as quantitative gene expression data from whole individuals and single cells (Krishna et al. 2013; Petersen et al. 2015; Siebert et al. 2019).

    Evolutionary forces such as mutation rate and selection have not been explored. Importantly, Hydra’s nonsenescent phenotype is not likely to be the result of natural selection—but so-called exaptation (Gould and Vrba 1982)—toward a longer life span but rather the consequence of features that evolved to increase survival under harsh environmental conditions (Schaible et al. 2014). In the wild, due to a high extrinsic mortality, Hydra’s average life span is counted in weeks rather than in years (Schaible et al. 2015). The nonsenescent phenotype is observable only under benign laboratory conditions, i.e., an absence of extrinsic mortality risks, steady feeding, and optimal temperature. As a prerequisite for Hydra’s regeneration capabilities, a high cell turnover and stem cell omnipotence have been recorded (Schaible et al. 2017). Nonsenescence of Hydra individuals, however, does not exclude the possibility that stem cells and differentiated cells could “wear out” and senesce over time, in the manner of individual bacteria (Stewart et al. 2005; Wang et al. 2010).

    Somatic mutations are frequently considered as one or even the major factor of the aging process (López-Otín et al. 2013; Vijg 2021; Wolf 2021; Yousefzadeh et al. 2021). Essentially, it is assumed that an increasing mutational burden gradually undermines the integrity of individual cells, followed by that of tissues, organs, and finally the entire organism. This process leads to progressive decay and aging-associated diseases. Accordingly, it has been shown that across a range of very different long-lived mammalian species, mutation rates strongly negatively correlate with life span (Cagan et al. 2022). Out of the species examined by Cagan et al., humans were the longest-lived and had the lowest somatic mutation rate. Thus, the question arises of whether species that live longer than humans have an even lower mutation rate.

    Therefore, this paper tests the hypothesis that Hydra, specifically the well-studied H. magnipapillata strain 105, with its predicted maximal life span of over 1000 years (Schaible et al. 2014), has a significantly lower mutation rate than humans. This purely female strain has reproduced exclusively asexually ever since it was established in 1973 from a single individual (Sugiyama and Fujisawa 1977; Schaible et al. 2015). Furthermore, we investigated the relationship between the mutation rate and adaptability of this strain.

    Results

    Hydra’s mutation rate is at least as high as that of humans

    In mammals, somatic mutations can be identified by the genomic sequencing of single cells, on the one hand, and genomic bulk sequencing of the same individual, on the other hand. Single-cell genomic variations that are present neither in the reference genome nor in the bulk are most likely the result of somatic mutations. This approach is based on the reasonable assumption that the bulk consensus represents the genomic state of the zygote. Consequently, deviations from this state would be caused by somatic mutations (Milholland et al. 2017).

    In the common, asexually reproducing H. magnipapillata strain 105, however, counting the mutations in this way over the life span of a single individual is not possible: unlike mammals whose life begins with a single cell, the zygote, each of these Hydra individuals starts already with at least 10,000 cells forming a genetic mosaic. Nonetheless, the same principle as in mammals could be applied if the mutations were counted over the life span of a genet—in our case the strain 105, whose life started also with a single zygote sometime before the strain's cultivation in 1973 (Fig. 1A).

    Figure 1.

    Biological and historical background, experimental setup, and pedigree relationships of analyzed genomic data sets of H. magnipapillata strain 105. (A) The last sexual reproduction, representing the strain's genetic origin, occurred in the wild. Shortly after fertilization, a Hydra zygote differentiates into noninterchangeable endodermal and ectodermal epithelial (E cells, red) and interstitial (I cells, blue) stem cell lineages. (B) The H. magnipapillata strain 105 was derived from a single female polyp that was caught in the wild in 1973 (Chapman et al. 2010; Schaible et al. 2015). The strain has since reproduced exclusively asexually by budding and has been propagated in laboratories worldwide. For generating a reference genome sequence, the strain was recloned from a single polyp in the laboratory of Dr. Hans Bode at UC Irvine in 2004 (Chapman et al. 2010). We obtained the strain from UC Irvine in 2006 and established a pedigree consisting of its mother and eight daughters (buds) generated at three time points: 2006, 2010, and 2017. (C) In 2018, we used single-cell whole-genome sequencing of three E cells and three I cells to identify single-nucleotide variants (SNVs) that had accumulated in each of these cells since their common origin (zygote). (D) In addition, we performed whole-animal sequencing of nine Hydra individuals, comprising the pedigree mentioned above. We identified SNVs that arose in each daughter in comparison to the mother. To further identify heterozygous genome positions of the strain, we also compared the data of all nine polyps individually with a reference genome sequence determined in another laboratory from more distantly related individuals (Chapman et al. 2010). The total numbers of uniquely identified SNV positions are displayed: in C for the comparisons of each E cell versus all I cells combined (red) and each I cell versus E cells combined (blue); in D for the comparisons of buds versus mother and individuals versus reference. In our experimental setting, the last common ancestor (LCA) 1 of the nine-individual pedigree was the mother at its stage of somatic evolution in 2006. The LCA2 of our nine-individual pedigree and the individual used for single-cell sequencing was budded also in 2006. The LCA of the individuals sequenced by us and those from which the reference genome was derived can be dated back only to a time window of 1974–2004. Pedigree lines: continuous—life history of a Hydra individual; dashed—descant relation including multiple Hydra individuals derived from each other by budding.

    A challenge, however, is the fact that bulk sequencing in Hydra is much less likely to represent the genomic state of the zygote compared to mammals. The reason is that each of the many budding processes that occurred from the cultivation of the strain until today represents a bottleneck, making founder effects likely—which would clearly bias the final mutation count. Therefore, we instead exploited the strict separation of the Hydra I- and E-stem cell lineages mentioned above, which occurs very early after the zygote state (Bosch 2009; Hemmrich et al. 2012; Domazet-Lošo et al. 2014; Siebert et al. 2019). Thus, our rationale for counting the somatic mutations in each of the I cells is to search for variants that are present neither in any of the E cells nor in the reference genome of the strain (and vice versa for the E cells). Following this rationale, we identified the somatic single-nucleotide variants (SNVs) in six cells—three I cells and three E cells—taken from a single Hydra individuum (Fig. 1C). For comparability, we used the same technical approach employed by others to determine mutation rates in mice and humans (Milholland et al. 2017).

    We found 37,379 and 3184 unique positions affected by mutations in the stem cell lineages of I cells and E cells, respectively. On average, we found 14,459 SNVs per stem-like I cell and 1083 SNVs per stem-like E cell (Table 1) taking into account only positions that were covered by at least 20 reads. Dividing the number of SNVs by the number of genomic positions reaching this threshold, we observed on average one SNV for every 5770 bases in I cells and one for every 53,492 bases in E cells. To determine the somatic mutation rate on this basis, we needed to estimate the number of cell divisions that had occurred since the zygote state. This was done using two assumptions: firstly, by using published cell division rates—i.e., every 24–30 h and every 3–4 days, for I cells and E cells, respectively (David and Campbell 1972; Campbell and David 1974); secondly, by assuming that the investigated I cells and E cells evolved separately for 44 years—i.e., presuming that the last sexual reproduction took place shortly before strain 105 originated from a single female polyp in 1973 and that reproduction since then had occurred exclusively asexually by budding until our sampling in 2018.

    Table 1.

    SNV calling in single cells (<1973–2018)

    Based on these assumptions, we inferred an average mutation rate per cell division of 1.21 × 10−8 for I cells and 4.06 × 10−9 for E cells (Fig. 2A). In each of the 6 cells studied, the mutation rate in the coding sequence is lower than in the rest of the genome (P = 0.031, paired Wilcoxon test), although the differences are smaller for E cells. On average, the estimated mutation rate per cell division in the coding sequence is 1.02 × 10−8 for I cells and 3.46 × 10−9 for E cells (Fig. 2A; Supplemental Tables S1, S2).

    Figure 2.

    Single-cell whole-genome analysis to determine the mutation rate per cell division in Hydra. (A) Mutation rates per cell division in Hydra I cells and E cells in genome and coding sequences (CDS) compared with literature data for human and mouse dermal fibroblasts (Milholland et al. 2017). Points represent single sample estimates, lines denote medians (Supplemental Table S2) (for CDS of human and mouse only averages were published). (B) Dependence of mutation rate estimates on the number of mitoses since the zygote. Larger dots represent predicted mutation rates shown in A based on our estimated numbers of mitoses—14,292 and 4605 for I cells and E cells, respectively. (C) Mutation spectra in Hydra of the three I cells and three E cells compared to literature data for human and mouse dermal fibroblasts (Milholland et al. 2017).

    Potential error sources are unlikely to affect the order of magnitude of the mutation rate estimates

    Our mutation rate calculations are based on an estimate of 14,292 and 4605 mitoses in I cells and E cells, respectively. We investigated the influence of the number of cell divisions—particularly their underestimation—on the magnitude of the mutation rate estimate (Fig. 2B). We found that the number of divisions of E cells would have to be more than 50% higher than assumed to fall below the mutation rate of humans. This would imply that the last sexual reproduction in respect to the polyp that gave rise to strain 105 would have to have occurred more than 20 years before it was captured. In the case of I cells, the number of mitoses would need to be more than four times higher than assumed. In this scenario, the last sexual reproduction would have to have occurred more than 200 years ago, at the beginning of the nineteenth century.

    Moreover, we compared the SNVs identified in the six single cells together with their respective 5′ and 3′ contexts (Supplemental Fig. S1) with known single-cell artifact signatures. Based on this analysis, the total fraction of potential single-cell artifacts in our data set was estimated to range from 1.3% to 3.5% (Miller et al. 2022; Supplemental Table S2). An example of a potential artifact would be spurious mutations caused, e.g., by elevated temperatures during the cell lysis step. Typical for this artifact is a mutation spectrum that is largely dominated by C-to-T transitions due to the possibility of cytosine deamination under high temperatures (Dong et al. 2017). The fraction of these mutations in our data is slightly higher than in humans and mice (Fig. 2C). Furthermore, we analyzed the evenness of single-cell genome amplification by comparing the absolute log2 ratios of adjacent bins (Miller et al. 2022). The median absolute pairwise difference (MAPD) ranged from 0.78 to 1.28 in the single-cell experiments (Table 1).

    Single-nucleotide variants are detectable over different evolutionary timescales

    As described above, we used single-cell sequencing to identify somatic variants that would have been present since the last occurrence of sexual reproduction before captivity, most likely shortly before 1973. In addition, we examined two kinds of SNVs by whole-animal genome sequencing of a pedigree, consisting of its mother and eight daughters (buds) generated at different time points between 2006 and 2018 (Fig. 1D; Supplemental Figs. S4–S9). These two kinds of variants would have originated either within a shorter timescale (from 2006 to 2018) or over a much longer timescale, before the last sexual reproduction (i.e., well before 1973) (see below and Supplemental Results), respectively.

    For the shorter timescale, within the nine-animal pedigree, we compared each daughter (bud) against the mother using strict filtering. This resulted in 809 SNVs across the eight buds covering 505 unique variant positions (Table 2; Supplemental Results; Supplemental Table S3). In this setting, the LCA 1 denotes the mother at her stage of somatic evolution in 2006, i.e., 12 years before sampling (Fig. 1B).

    Table 2.

    SNV calling in nine-animal pedigree

    For the longer timescale, we examined each of the sequenced individuals of our nine-individual pedigree for deviations from a monoallelic 850 Mb sized reference genome (Chapman et al. 2010). The identified SNV alleles occurred with high allele frequencies and were often coupled to each other. Therefore, they were most likely already present in the two haploid gamete genomes that combined in the zygote that later gave rise to the individual caught in 1973 from the wild (see Supplemental Results). These germline variants provide information about evolutionary pressures on Hydra in the wild. We detected ∼4.5 million SNVs per comparison and 6 million unique SNV positions in total across all samples (Supplemental Table S3).

    Multiple appearance and homozygosity of SNV alleles generated by somatic mutations

    In total, 69.9% of variants identified in single cells were also found in the comparison of the individuals versus the reference. We found a U-shaped bipartite pattern, with most single-cell SNVs being present either in all nine examined individuals or in none (Fig. 3A). The three I cells examined shared more variants than the E cells (Fig. 3B). Compared to E cells, we found I cell SNVs to be detectable more often at the level of Hydra individuals (Fig. 3C) (P < 2.2 × 10−16, Wilcoxon test). In addition, somatic SNVs that were identified in two I cells were found in more individuals than those found only in one I cell (P < 2.2 × 10−16). The same relation applies for SNVs found in all three versus two I cells (Fig. 3C) (P = 2.9 × 10−3).

    Figure 3.

    Indicators of positive selection among Hydra single-cell SNVs. (A) Distribution of hetero- and homozygous single-cell SNVs by their prevalence in the nine individuals. Numbers at the top of the columns represent the observed versus expected ratio of homozygous to heterozygous mutations based on the respective mutation frequency of heterozygous mutations in the genome (see Supplemental Table S7 for details). (B) Venn analysis of single-cell SNVs in I cells and E cells. No overlap was detected between I cells and E cells. (C) Prevalence of single-cell SNVs in 1–9 individuals examined.

    Moreover, 19% of the variants detected in the single-cell experiment were homozygous. Homozygous variants occurred much more frequently in the single-cell experiment than would be expected from their frequency in the genome. Depending on the different classes of shared single-cell SNVs, we observed them 580–22,000 times more frequently than expected (Fig. 3A). We examined the prevalence of SNVs detected at the single-cell level in the comparison of the individuals versus the reference. We found homozygous variants exhibiting higher minor allele frequencies than heterozygous ones (median: 0.43 vs. 0.28, P < 2.2 × 10−16, Wilcoxon test). Furthermore, the percentage of homozygosity was considerably higher among single-cell SNVs found in all nine individuals compared to those in none of the individuals (27% vs. 9%, P < 2.2 × 10—16, Fisher's exact test).

    Opposite selection patterns manifest before and after a recent drastic environmental change

    We considered how strongly the strain was affected by positive and negative selection across the three evolutionary timescales examined. For this purpose, we examined the variants identified in coding sequences and categorized them into nonsynonymous and synonymous substitutions (Supplemental Tables S2, S3). Based on the respective counts, our analysis then compares the rate of nonsynonymous variants (dN) as an indicator of selection, against that of synonymous variants (dS) representing the genetic drift. Therefore, a dN/dS ratio >>1 indicates positive selection, ∼1 indicates neutral evolution or genetic drift, and ≪1 negative selection (Kimura 1980). In line with this, our single-cell comparison covering the intermediate timescale from the zygote onward, i.e., most likely, mainly the period in captivity (1973–2018) (Fig. 1), and revealing somatic SNVs, showed a genome-wide average dN/dS ratio of 0.63. On the longest evolutionary timescale investigated here and mainly reflecting germline variants accumulated in the wild before sexual reproduction (comparison of whole-animal data versus reference), negative selection was even more pronounced, with a dN/dS ratio of 0.49. However, the somatic variants that distinguish the examined buds from their mother and accumulated in the shortest examined timescale (2006–2018) (Fig. 1B; comparison of whole-animal data of buds vs. mother) showed rather a weak sign of positive selection (dN/dS = 1.06).

    Many biological processes under selection are related to Hydra’s regenerative capacity

    We investigated whether pathways and biological processes were adapted (our criterion for positive selection: dN/dS > 1) or strongly conserved (our criterion for strong negative selection: dN/dS < 0.1) during the examined evolutionary timescales. Of course, this analysis was limited to variants associated with coding sequences of ontologically annotated Hydra genes. Of the genes affected by the 505 SNVs identified on the shortest examined timescale (2006–2018) only 15 fulfilled these requirements, i.e., too few for any meaningful follow-up. Therefore, further analysis had to be restricted to the intermediate and longest timescales examined (Fig. 1).

    Regarding somatic SNVs that accumulated most likely mainly in captivity (after the last sexual reproduction, i.e., probably shortly before 1973, until 2018), 28 biological processes showed signs of positive selection (Fig. 4A; Supplemental Table S4), whereas one process was found to be strongly conserved: “positive regulation of multicellular organismal process” (GO:0051240, dN/dS = 0.09). For germline SNVs that originated in the wild (before the last sexual reproduction, i.e., mostly well before 1973), exactly the opposite pattern was observed: 274 processes display strong negative selection (Fig. 4B; Supplemental Table S5), while only one process, “heterotrimeric G-protein complex” (GO:0005834, dN/dS = 1.30), was found to be adapted. In addition, two processes exhibited an enrichment for stop-gain mutations for the time in the wild (one-sided Fisher's exact test using a false discovery rate [FDR] threshold of 0.05) (Supplemental Table S8): the “nucleic acid metabolic process” (GO:0090304, FDR = 4.5 × 10−7) and “hydrolase activity” (GO:0016787, FDR = 7.0 × 10−5). Both processes were also identified as positively selected for the time in captivity. We also analyzed start-loss and stop-loss mutations. However, only a small fraction of genes was affected in comparison to nonsynonymous and stop-gain mutations (Supplemental Fig. S2). Thus, no corresponding enrichments for biological processes were found, neither for the time in captivity nor for the time in the wild.

    Figure 4.

    Summary of processes affected (A) by positive selection during 44 years in captivity after last sexual reproduction and (B) by strong negative selection over a longer timescale before last sexual reproduction in the wild. The analysis of single-cell data/somatic SNVs (A) and the comparison of whole-animal data against the Hydra reference genome/germline SNVs (B) resulted in 28 and 274 biological processes affected by positive and strong negative selection, respectively (Supplemental Tables S4, S5). For this overview, processes were summarized by REVIGO (Supek et al. 2011). The total bar length gives the dN/dS value of the respective process; the blue/red part of the bar represents the fraction of expressed/not expressed genes under laboratory conditions. The values in brackets give the number of unique genes that were summarized under the respective term (bold black) and how many of these genes were found to be expressed (blue). An asterisk marks processes with significantly more expressed genes than expected based on the ratio of total expressed genes to total genes examined (11,783/33,820; one-sided Fisher's exact test; FDR < 0.05). Strong negative selection was defined by dN/dS < 0.1.

    For functional validation of these findings, we determined, for all processes and process categories identified as affected by positive or strong negative selection, how many of the genes involved in them are expressed under laboratory conditions. For this purpose, we sequenced the whole-animal mRNA of three individuals that we had previously examined genomically (Supplemental Table S1). The results showed that slightly more than half of these processes (168/304) (Supplemental Tables S4, S5) and all but one of the process categories formed from them (Fig. 4A,B) showed a significantly higher proportion of expressed genes than would be expected based on the total proportion of expressed genes (11,783/33,820; one-sided Fisher's exact test; FDR < 0.05). In particular, the genes of the process categories affected by strong negative selection are expressed to a large extent (Fig. 4B).

    For example, “animal organ regeneration” (GO:0031100) was found to be the strongest conserved biological process during the time in the wild: while this process was affected by in total 17 mutations during this time frame, only one of them changed an amino acid, resulting in a dN/dS ratio of only 0.018. At the same time 10 out of 11 affected genes were expressed under laboratory conditions—significantly more than expected (FDR = 9.15 × 10−6) (Supplemental Table S5). The only two processes that were found to be even more strongly conserved were the “zona pellucida receptor complex” (GO:0002199), which is important for successful fertilization (Hinsch et al. 1997) and “establishment of neuroblast polarity” (GO:0045200). The former was found to be affected by 10 mutations for the time in the wild and the latter by six, none of which changed an amino acid.

    The “generation of neurons” (GO:0048699) turned out to be the process most strongly affected by positive selection for the time during captivity with, in total, 11 mutations, all of which changed an amino acid. In addition, with 85 out of 156 genes belonging to this process, significantly more genes were expressed than expected (FDR = 1.1 × 10−5).

    The second and third most positively selected processes for the time during captivity were “nucleic acid metabolic process” (GO:0090304, 30 out of 32 mutations nonsynonymous, dN/dS = 4.4) and “growth factor binding” (GO:0019838, 12 out of 13 mutations nonsynonymous, dN/dS = 3.3), respectively.

    Discussion

    Our main finding—that the H. magnipapillata strain 105 does not exhibit a significantly lower somatic mutation rate than much more rapidly aging species—is an argument against the generalization of the somatic mutation theory of aging (Morley 1995). Conversely, it could be stated that a mutation rate as high as that of humans could be sufficient to enable life spans of centuries or even more. However, Hydra may also be seen as an exception among multicellular organisms, since it somewhat resembles a collection of unicellular organisms (Bosch et al. 2010). Thus, insights into Hydra’s biology of aging (Tomczyk et al. 2015) may not be directly transferable to most other animals, including mammals.

    To validate the order of magnitude of the mutation rates determined, we examined our results for possible sources of bias: First, our mutation rates are based on an estimate of the number of cell divisions, which assumes that the last sexual reproduction leading to the strain examined happened not long before cultivation of this strain in 1973 (see also Supplemental Discussion). However, even if this assumption was wrong, our analyses show that the last reproduction would have to have occurred decades or even centuries before the cultivation of the strain to change the order of magnitude of the estimated mutation rate (Fig. 2B). Since Hydra is known to usually switch to sexual reproduction when conditions are not within a narrow window of optimality, e.g., water temperatures ∼19°C, it seems highly unlikely that so much time would have elapsed without sexual reproduction in a naturally fluctuating environment (Martínez and Bridge 2012; Kaliszewicz and Lipińska 2013). Second, we analyzed the mutation spectrum for potential single-cell artifacts. Both the estimated fractions of potential artifacts (1.3%–3.5%) and differences to public human and mouse data (Fig. 3C; Supplemental Fig. S1) are relatively small, rendering any significant influence on the order of magnitude of our results unlikely. Third, the sequencing coverage is relatively evenly distributed across the genomes in our experiments. The MAPD metrics, comparing adjacent sequencing bins, range from 0.78 to 1.28, while only values >2 are seen as potentially problematic (Miller et al. 2022).

    Another factor that could potentially lead to overestimated mutation rates is the older genome assembly used (Hydra 2.0) (Chapman et al. 2010). However, our approach ensures that an SNV is called only if the single cell that is examined deviates not only from the reference genome but also from the respective opposite stem cell lineage. Furthermore, if low reference genome quality led to false-positive SNV callings, one would expect those SNVs to be shared at least among the single cells of the same lineage. The proportion of common SNVs among the single cells examined is, however, clearly limited (Fig. 3B). Therefore, a significant influence of the genome assembly on the order of magnitude of the mutation rates appears unlikely. In contrast, the majority (69.9%) of SNVs identified by single-cell genome sequencing were also found by comparing the whole-animal genomic data to the reference genome—indicating the robustness of our variation-calling approaches and their relative independence from possible biases (e.g., by genome amplification, sequencing chemistry, read coverage, or base/variant calling algorithm). Moreover, the similarity of somatic mutation spectra in Hydra, human, and mouse (Fig. 2C) argues against deleterious effects of the A/T richness of the polyp's genome. Admittedly, the highly varying mapping rates of 15%–80% (Table 1) are a potential problem. Note, however, that we considered for variant calling only genome positions with a read coverage ≥20, both in the single cell examined as well as the combined mapping result of three single cells of the respective opposite stem cell lineage. Given this conservative approach, the number of bases covered by the respective comparisons correlated strongly with the mapping rates (22–106 million bases) (Table 1; see Supplemental Fig. S3 for overlaps in covered bases). Nevertheless, the robustness of our approach is supported by the fact that we found very similar mutation rates within each stem cell lineage, regardless of the mapping rate. For example, for E cell #2, a mapping rate/number of covered bases of only 15%/22 million resulted in a mutation rate per cell division of 4.35 × 10−9. For E cell #1, a much higher mapping rate/number of covered bases of 63%/102 million resulted in a mutation rate of 3.95 × 10−9 (Fig. 2A; Table 1).

    However, the overall lower mapping rates in E cells in comparison to I cells might still have contributed to an underestimation of the number of mutations in the former so that the real difference in the resulting mutation rate between the stem cell lineages could be lower than presented here. This could partly explain the surprising result that I cells with the potential to differentiate to germ cells display a higher mutation rate than E cells that lack this ability. Another explanation would be that I cells are more important for adapting to changing environmental conditions, e.g., due to their ability to differentiate into neurons and gland cells (see below).

    Another interesting aspect is the contrasting pattern found with regard to the extent of positive and negative selection in the recent evolution of the examined Hydra strain with respect to the time in the wild and in captivity. In the wild, negative selection on the germline SNVs was found in all pathways examined except one; 274 processes were even found to be highly conserved (dN/dS < 0.1). This was to be expected, since negative selection dominates in coding sequences—even more so if species are highly adapted to a rather constant environment (Yang 2005). For these reasons, it is not surprising that we found extremely few amino acid changes in processes that can be linked directly to highly evolved traits of Hydra. For example, the regenerative capacity of Hydra is among the most advanced in the animal kingdom (Bosch 2007; Vogg et al. 2019b). Therefore, changes in the corresponding genes are likely to decrease fitness. Similarly, the strong purifying selection of “positive regulation of multicellular organismal processes” (GO:0051240) under laboratory conditions reflects Hydra’s delicate balance of reconciling an extremely active stem cell community with the maintenance of the—albeit simple—architecture of the organism (Bosch et al. 2010). Histone ubiquitination, as another example, is a process well known to be strongly conserved throughout evolution from yeast to mammals (Cao and Yan 2012; Wang et al. 2017).

    In stark contrast, the evolutionarily extremely short time frame of 44 years in the laboratory is characterized by significant adaptation of a number of biological processes, indicating that for somatic SNVs, positive selection became the prevailing evolutionary force after this severe change of environmental conditions. Fitness adjustments to changes in laboratory conditions are a phenomenon already observed in bacteria, worms, and insects (Large et al. 2016; Hoffmann and Ross 2018; Knöppel et al. 2018). Our finding underlines Hydra’s adaptability to environmental changes as providing the basis for its sustainability over an evolutionary period of ∼200 million years (Schwentner and Bosch 2015).

    Most adapted processes (Fig. 4A) support the hypothesis that a significant proportion of selection in Hydra acts on the level of single stem cells rather than only at the level of individuals (Bosch et al. 2010). Following a drastic environmental change, e.g., from the wild to the laboratory, cell clones may increase their prevalence in the cell population of an individual by fitness gains based on positive selection of: (1) “nucleic acid metabolic process” to improve nucleic acids availability for continued cell proliferation (Gross and Rotwein 2016; Zhu and Thompson 2019), (2) “regulation of cellular response to growth factor stimulus” for faster cell cycle progression and higher division rates (Jones and Kazlauskas 2001; Fingar and Blenis 2004; Gross and Rotwein 2016), and (3) faster “cell migration” that is associated with differentiation (Hager and David 1997; Siebert et al. 2019).

    The strongest sign for positive selection, summarized by “generation of neurons” (GO:0048699), however, points toward selection acting on the level of individuals, as neuronal integration is essential for the fitness of a multicellular organism. This process includes the production of neuroblasts and their differentiation into neurons. It has long been known that Hydra can feed on live animals but not on dead ones, suggesting that they receive from their prey certain signals necessary to initiate the feeding response (Beutler 1924), most likely via their ectodermal nerve net. As food supply and respective feeding response change after the transition from the wild to captivity, it seems plausible that relevant neurophysiological processes also undergo adaptation.

    In the comparison of the mother against each of its eight daughters, we found only very small differences. Moreover, the rather similar numbers of 86–113 SNVs identified displayed no correlation with the time of budding (Table 2). This suggests that these variants appeared almost exclusively due to the founder effect, i.e., a shift of allele frequencies as a consequence of random sampling of cells during budding. This would mean that, at least during this late phase of captivity (the final 12 [27%] of the total of 44 years), positive selection did not increase allele frequencies of beneficial alleles to an extent that would be detectable when analyzing an entire animal within the limits of the sequencing resources available to us. In line with the hypothesis that these SNVs were predominantly a result of the founder effect, the rather weak sign of positive selection that was observed (dN/dS = 1.06) corresponds almost perfectly to neutral selection (dN/dS ∼ 1). Conversely, this means that there is also no evidence for the assumption (discussed by Schaible et al. [2015]) that in order to prevent accumulation of damage in the mother, Hydra cells damaged by mutations may be transmitted via budding to the offspring.

    In conclusion, the presented first insights into the evolutionary forces that shape the genome of the nonsenescent H. magnipapillata strain 105 excluded that its exceptional longevity under benign laboratory conditions is based on a low mutation rate. On the contrary, somatic mutation rates higher than in humans or mice provide the genetic variability underlying Hydra’s high adaptability. It remains to be tested whether also other extremely long-lived species deviate from the predictions of the somatic mutation theory of aging. Additional research is needed to decipher the mechanisms behind Hydra’s nonsenescence as an exapted, i.e., not brought about by natural selection, exceptional metazoan phenotype.

    Methods

    To ensure comparability of results, our methodological approach at the single-cell level almost completely replicated that used to determine human and mouse mutation rates (Milholland et al. 2017; see Supplemental Table S6 for a direct methodical comparison). Briefly, individuals were examined ∼24 h after their last feeding. For the isolation of three stem-like E cells and three stem-like I cells (likely a mixture of ecto- and endodermal cells, as we did not differentiate between the two), the first head and foot were dissected. Dissociation of the remaining body column to single-cell suspensions and subsequent classification of cells was then conducted by an established protocol using a maceration solution containing glycerin, glacial acetic acid, and water (David 1973). After isolation of each single cell with a pipette under the microscope at 40× magnification, the remaining glacial acetic acid was evaporated at 60°C for 2 h. The GE Healthcare illustra Single Cell GenomiPhi Kit was used for cell lysis and multiple displacement amplification. Sequencing was performed using Illumina sequencing methodology (Bentley et al. 2008). In detail, 2 µg of amplified DNA was introduced into Illumina's TruSeq DNA PCR-free library preparation. Paired-end sequencing was conducted using Illumina HiSeq 2500 (high-throughput mode, 2 × 101 cycles) and NovaSeq 6000 (Xp workflow, 2 × 251 cycles) devices. Data were obtained using Illumina's bcl2fastq v1.8.4 (HiSeq 2500) and v2.20.0.422 (NovaSeq 6000). Trim Galore! was used for read trimming (https://github.com/FelixKrueger/TrimGalore) and BWA-MEM for mapping (Li and Durbin 2009). Duplicate reads were removed using SAMtools rmdup (Li et al. 2009). GATK was used for indel realignment and base quality recalibration (Van der Auwera et al. 2013), and VarScan 2 (Koboldt et al. 2012), MuTect (Cibulskis et al. 2013), and UnifiedGenotyper (Van der Auwera et al. 2013) for variant calling. For each of the 6 cells examined, MuTect and VarScan 2 were invoked such that the cell currently under consideration served as the somatic/tumor input, and the union set of 3 cells of the respective other cell type served as the normal/bulk input. The same caller settings and filter criteria as in Milholland et al. (2017) were used. Only variants predicted by all three callers were considered for further analysis. As in Milholland et al., only positions with a read coverage ≥20 were taken into account for variant calling. Thus, the number of mutations per base was calculated by dividing the number of variant positions by the total number of bases covered by ≥20 reads.

    To validate our results further, we implemented a pipeline for single-cell artifact minimization, as described recently in the literature (Miller et al. 2022, Extended Fig. 1G). Briefly, it comprises three steps: (1) calculation of MAPD score as an indicator for potential uneven genome coverage; (2) examination of the known potential single-cell artifact signature SBS scE; and (3) examination of the combined known potential single-cell artifact signatures SBS scE and SBS scF (Petljak et al. 2019). The latter signatures were analyzed together with the COSMIC v3 signatures using the R package sigfit (Gori and Baez-Ortega 2018).

    For the estimation of the mutation rate per cell division, the number of mitoses since the time of the zygote was assumed to be the sum of developing mitoses and continuous regeneration mitoses. For developing mitoses, we used log2(105) = 17. For continuous regeneration mitoses, given that I cells and E cells are known to divide every 24–30 h and 3–4 days, respectively (David and Campbell 1972; Campbell and David 1974), we used Formula and Formula, respectively.

    For examining whole Hydra individuals, the Genomic DNA Isolation Kit from Norgen Biotek was used for DNA extraction, the NEBNext Ultra II DNA Kit for library preparation (DNA input 10 ng), and an Illumina HiSeq 2500 device for paired-end sequencing (2 × 101 cycles). Data were extracted using Illumina's bcl2fastq v2.20.0.422. Read trimming was conducted with cutadapt (Martin 2011) and Trimmomatic (Bolger et al. 2014), read mapping with BWA-MEM (Li and Durbin 2009), removal of duplicates, indel realignment, and base quality recalibration with GATK (Van der Auwera et al. 2013), and clipped second mate overlaps using bamUtil (Jun et al. 2015). Regarding the comparison of the buds versus their mother, for all buds, mapped reads were additionally downsampled to the lowest number of mapped reads among buds before variant calling. Variants were accepted if they were predicted by at least three of the following six callers: BCFtools (Li 2011), FreeBayes (https://github.com/freebayes/freebayes), Mutect2 (Van der Auwera et al. 2013), Platypus (Rimmer et al. 2014), VarDict (Lai et al. 2016), and VarScan 2 (Koboldt et al. 2012). For the comparisons of each individual versus reference, “germline” variants were kept that were flagged accordingly. For the comparison of the buds versus their mother, only “somatic” variants were kept that were flagged as somatic by Mutect2, VarScan 2, and VarDict. For the remaining callers, BCFtools FreeBayes, and Platypus, filters were adapted from the Bcbio-nextgen pipeline. Finally, only those somatic variants were kept for downstream analyses that either could be validated in the independent single-cell experiment analysis or had an FDR below 0.05 (based on a one-sided Fisher's exact test). For the multiple test correction, we used the respective number of germline variants (Supplemental Tables S1, S3). The entire pipeline used for this workflow is available at GitHub (https://github.com/Hoffmann-Lab/muvac) (see also Supplemental Code).

    To determine dN/dS ratios, we first categorized the subset of SNVs hitting coding sequences into nonsynonymous and synonymous substitutions using the Hydra 2.0 annotation (Chapman et al. 2010) and a custom script (Supplemental Code). Using another custom script, we counted the number of potentially nonsynonymous and synonymous sites as described by the literature (Nei and Gojobori 1986). The rate of nonsynonymous substitutions dN was then estimated by dividing the total number of nonsynonymous differences observed by the total number of potential nonsynonymous sites and correcting for backmutations using the Jukes–Cantor formula. The same was done for dS and synonymous differences/sites, respectively (Nei and Gojobori 1986). The entire procedure described was applied either across all protein-coding genes to determine genome-wide dN/dS ratios or across all genes belonging to a certain GO category to determine GO-category-specific dN/dS ratios (Supplemental Tables S4, S5). We did not determine dN/dS ratios for single genes, as almost all genes were, if at all, affected by low numbers of variants, rendering dN/dS estimations unreliable.

    For the functional annotation of Gene Ontology analyses the Blast2GO (Conesa et al. 2005) resource from the Hydra 2.0 Genome Project Portal was used (https://research.nhgri.nih.gov/hydra/download/functional_annotation/Blast2GO/blast2go_results.tsv).

    To analyze gene expression data of whole Hydra individuals, the Norgen Biotek Total RNA Purification Kit was used for RNA extraction. The library preparation was performed with the NEBNext Kits Poly(A) mRNA Magnetic Isolation Module, Ultra II directional RNA Multiplex Oligos for Illumina, using 50–100 ng of total RNA. For sequencing, an Illumina HiSeq 2500 device in paired-end mode was used (51 cycles). Read trimming was conducted with cutadapt (Martin 2011) and Trimmomatic (Bolger et al. 2014), read mapping with segemehl (Hoffmann et al. 2014), and quantification with featureCounts (Liao et al. 2014). We assessed all genes as expressed that were found with a median transcript per million value of at least 5.

    To examine whether a GO category X was enriched for expressed genes, we used a one-sided Fisher's exact test with the following input: number of expressed genes belonging to X, number of nonexpressed genes belonging to X, number of expressed genes not belonging to X, and number of nonexpressed genes not belonging to X. To compare the percentage of homozygosity in single-cell SNVs found in all nine individuals versus those in none of the individuals using a one-sided Fisher's exact test (Fig. 3A), the following input data were used: number of homozygous SNVs found in all nine individuals, number of heterozygous SNVs found in all nine individuals, number of homozygous SNVs found in none of the individuals, and number of heterozygous individuals found in none of the individuals. To examine whether a GO category X was enriched for genes with stop-gain/start-loss/stop-loss mutations, we used a one-sided Fisher's exact test with the following input: number of genes belonging to X with stop-gain/start-loss/stop-loss mutations, number of genes belonging to X without stop-gain/start-loss/stop-loss mutations, number of genes not belonging to X with stop-gain/start-loss/stop-loss mutations, and number of genes not belonging to X without stop-gain/start-loss/stop-loss mutations. All read mapping was carried out against the Hydra vulgaris 2.0 genome (Chapman et al. 2010).

    Data access

    The raw data generated in this study have been submitted to the European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena/browser/home) under accession number PRJEB50344. The variants identified are available as a browser-accessible track hub at https://genome-euro.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://genome.leibniz-fli.de/paras/hydra/hub.txt&genome=hub_175867_Hm105_Dovetail_Assembly_1.0.

    Competing interest statement

    The authors declare no competing interests.

    Acknowledgments

    We thank Ivonne Görlich, Cornelia Luge, and especially Silke Förste for excellent technical assistance during this project. We thank Penelope Krumm and Debra Weih for proofreading and editing the manuscript. This work was supported by the Joachim Herz Foundation (Add-on Fellowships for Interdisciplinary Life Science, to A.S.).

    Author contributions: M.P., S.H., R.S., C.E., and A.S. conceived the project. R.S. provided the animals and isolated the cells. M.G., M.F., and M.B. conducted library preparation and sequencing. A.S., K.R., M.P., J.K., and H.K. analyzed the data. M.P. and A.S. wrote the initial manuscript draft. All authors revised the draft and approved the final manuscript.

    Footnotes

    • 8 Shared senior-authorship.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.279025.124.

    • Freely available online through the Genome Research Open Access option.

    • Received January 23, 2024.
    • Accepted October 18, 2024.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    References

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server