Genome-wide relaxation of selection and the evolution of the island syndrome in Orkney voles

  1. Gerald Heckel1,2
  1. 1Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland;
  2. 2Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
  • Corresponding author: gerald.heckel{at}unibe.ch
  • Abstract

    Island populations often experience different ecological and demographic conditions than their counterparts on the continent, resulting in divergent evolutionary forces affecting their genomes. Random genetic drift and selection both may leave their imprints on island populations, although the relative impact depends strongly on the specific conditions. Here we address their contributions to the island syndrome in a rodent with an unusually clear history of isolation. Common voles (Microtus arvalis) were introduced by humans on the Orkney archipelago north of Scotland >5000 years ago and rapidly evolved to exceptionally large size. Our analyses show that the genomes of Orkney voles were dominated by genetic drift, with extremely low diversity, variable Tajima's D, and very high divergence from continental conspecifics. Increased dN/dS ratios over a wide range of genes in Orkney voles indicated genome-wide relaxation of purifying selection. We found evidence of hard sweeps on key genes of the lipid metabolism pathway only in continental voles. The marked increase of body size in Orkney—a typical phenomenon of the island syndrome—may thus be associated to the relaxation of positive selection on genes related to this pathway. On the other hand, a hard sweep on immune genes of Orkney voles likely reflects the divergent ecological conditions and possibly the history of human introduction. The long-term isolated Orkney voles show that adaptive changes may still impact the evolutionary trajectories of such populations despite the pervasive consequences of genetic drift at the genome level.

    Islands are often seen as the ideal systems to observe adaptation in nature (Foster 1964; Losos and Ricklefs 2009; Grant and Grant 2014). A population newly arrived on an island often faces a novel environment with abiotic and biotic differences compared with its origin (Whittaker and Fernández-Palacios 2007). The shift of environmental conditions can lead to a drastic shift in the selective regime (Losos and Ricklefs 2009), which can sculpt island organisms drastically different from their continental relatives in both morphology and genetics (Foster 1964; Hofman et al. 2015; Lamichhaney et al. 2015; Renaud et al. 2015; Benítez-López et al. 2021; Chevret et al. 2021).

    Compared with their continental relatives, island organisms often experience a variety of changes of morphological, ecological, and behavioral traits, which are referred to as “island syndrome” (Foster 1964; Adler and Levins 1994; Losos and Ricklefs 2009). One of the classical patterns of change is that species with small body size tend to become bigger on islands, whereas species with large body size tend to become smaller (“island rule,” or Foster's rule) (Foster 1964; Benítez-López et al. 2021). The strength of evolutionary patterns following the island rule differs between taxa, but, for example, an increase of body mass up to 100% has been observed repeatedly in island populations (Lomolino 1985; Angerbjörn 1986; Adler and Levins 1994; Payseur and Jing 2021; Renom et al. 2021). The change of body size can be fast, for example, within 200 years for the house mouse on Gough Island (Rowe-Rowe and Crafford 1992). Studies of the genetics underlying the island syndrome have mainly focused on detecting signatures of directional selection on the genes related to shifted phenotypes (Raia et al. 2010; Choi et al. 2021; Payseur and Jing 2021). However, the traits of island populations can also be changed by relaxation of purifying or positive selection compared to the continental populations owing to the environmental differences. For example, lower predation pressure may lead to change of body size (Adler and Levins 1994) or coloration (Bliard et al. 2020). Further, the efficacy of selection, mainly on mildly deleterious variation, can be strongly reduced under genetic drift (Gravel 2016; Peischl et al. 2018), for example, with effects on immune functions in island birds (Barthe et al. 2022).

    The importance of genetic drift in the history of many island populations poses particular challenge to characterizing the contribution of selection to evolutionary change (Glinka et al. 2003; Li et al. 2012). Island populations usually experience genetic bottlenecks at the founding stage and at least partial genetic isolation thereafter, which may result in strong genetic drift with long-lasting genome-wide consequences on genetic diversity and allele frequencies (Ellegren et al. 2012; The 1000 Genomes Project Consortium 2015; Friis et al. 2018). The genomic background patterns caused by genetic drift make it challenging to identify the typical “genomic island” signatures of selection (Stephan 2016), like locally decreased genetic diversity and extreme Tajima's D, or increased FST (Hahn 2008; Li et al. 2012; Moinet et al. 2022). For example, regions with low diversity resulting from genetic drift could be interpreted as caused by selective sweeps (Harris et al. 2018) or segregating nonsynonymous alleles as a consequence of balancing selection (Johri et al. 2022).

    Linkage disequilibrium (LD) based methods measuring extended haplotype homozygosity (EHH: Sabeti et al. 2002) are suitable for detecting recent positive selection in the face of genetic drift from population bottlenecks, as LD changes more quickly than site frequency patterns (Ferrer-Admetlla et al. 2014; Weigand and Leese 2018). Methods like iHES (Voight et al. 2006), XP-EHH (Sabeti et al. 2007), and REHH (Qanbari et al. 2010) look for long regions of homozygosity caused by selective sweeps and use the statistics of full genomes as the background for significance testing. Nowadays, multiple genetic estimates in sliding windows are usually combined for selection tests, for example with Fisher's combined probability test or hidden Markov model (HMM) clustering (Ayub et al. 2013; Marques et al. 2018). Combining tests for multiple signatures of selection in different categories which are nearly uncorrelated in the way they are affected by variation of recombination rate and genetic drift in neutral genomic regions, for example LD based XP-EHH and derived allele frequency, allows to boost the accuracy of the detection of positive selection (Grossman et al. 2010).

    In this study, we investigated the genomic interplay between selection and genetic drift leading to the island syndrome in a system with an exceptionally long and clear history of population isolation. Common voles (Microtus arvalis) were introduced to the Orkney archipelago in the north of Scotland by Neolithic farmers >5000 years ago, likely as a food resource (Haynes et al. 2003; Martínková et al. 2013; Romaniuk et al. 2016). Common voles are very widespread in continental Europe with multiple distinct evolutionary lineages (Heckel et al. 2005; Beysard and Heckel 2014), but Orkney vole ancestry can be traced back specifically to populations in the Western evolutionary lineage from Belgium or northern France (Martínková et al. 2013; Wang et al. 2023). The species is absent from the rest of the British Isles but is abundant on seven Orkney islands. Archaeological material showed that the body size of the voles increased rapidly in the first centuries after introduction (Cucchi et al. 2014), leading to the current Orkney vole (M. a. orcadensis) phenotype with twice the body mass of its continental relatives (Millais 1904). The Orkney vole experienced a severe bottleneck through the introduction with a 20- to 30-fold decrease in the effective population size (Ne), but the Ne remained relatively large at several thousand even during the bottleneck (Wang et al. 2023). However, genetic drift in complete isolation for approximately 10,000 generations has left strong impacts in the genome of Orkney voles, including low genetic diversity and the accumulation of homozygous deleterious alleles (Martínková et al. 2013; Wang et al. 2023).

    Our analyses utilize the exceptionally clear history of the Orkney vole for assessing the long-term effects of genetic drift and selection on completely isolated natural vertebrate populations relative to populations that remained unaffected by isolation. Importantly, common vole populations on the continent underwent demographic fluctuations similar in timing and magnitude to those of Orkney voles (Ne ∼ 50,000–80,000 in the last 2000 generations) (Wang et al. 2023). We thus have the potential to identify the effects of the long-term history of isolation and of the interactions with the specific environment for these mammal genomes. We hypothesized that generally benign environmental conditions for common voles on the Orkney islands and stronger bottlenecks in the colonization history may have resulted in a relaxation of selection compared with continental populations (see Gravel 2016; Kutschera et al. 2020). In this study, we investigated whether relaxation of purifying selection has led to overall elevated levels of nonsynonymous variation in the Orkney populations (Kryazhimskiy and Plotkin 2008). Further, we explored whether differences in the environment for thousands of generations at relatively large population sizes have also enabled divergent selection and thus an adaptive cause for the rapid and drastic body size change, as well as other evolutionary changes, of Orkney voles.

    Results

    Extreme genomic background of Orkney voles

    We obtained full genome data from the resequencing of 14 Orkney voles and of 11 common voles from Belgium and northern and central France as the larger region of origin for the ancient human introduction to Orkney (Fig. 1A; Martínková et al. 2013; Wang et al. 2023). We refer to the latter samples in the following as continental voles. We used three M. obscurus voles as an outgroup. The average mapping depth was 26×, ranging from 17× to 41× per individual. After filtering, 23.7 million SNPs were kept, including 411,340 SNPs located in coding regions. Among the SNPs in coding regions, 2.0% had loss-of-function (LOF) alleles, 47.4% had nonsynonymous alleles, and 50.9% had synonymous alleles; 0.4% SNPs were multiallelic and had alleles in more than one functional category.

    Figure 1.

    Sample distribution and comparison of dN/dS ratios. (A) Sample locations of the common vole M. arvalis in western Europe and Scotland. Continental samples are marked with green; Orkney samples, with brown. The distribution range of M. arvalis is marked with light green. Note that the species is absent from the British Isles except for Orkney. (B) For each gene, the dN/dS ratio of continental voles was plotted against the ratio of Orkney voles. There were more genes with higher ratio in Orkney than in the continent. The dashed line marks an equal dN/dS in both groups. The density of the points is marked with colors. (C) Distribution of dN/dS ratios of Orkney voles versus continental voles. (D) For the genes with |Δ dN/dS| > 0 between Orkney and continental voles, values were overall biased toward a higher dN/dS ratio in Orkney voles.

    We characterized genomic differences between Orkney and continental voles in 39,784 genomic windows of 50 kb. The genetic diversity of Orkney voles was almost one order of magnitude lower than that for continental voles (median π: Orkney = 2.5 × 10−4 vs. continent = 2.2 × 10−3; Z-test, P < 2.2 × 10−16) (Fig. 2), with nearly complete diversity loss (π < 1 × 10−4) in 37% of the windows of the genome. In continental voles, only 0.7% of the windows had π < 1 × 10−4. The Tajima's D of Orkney voles was overall slightly negative but with a very wide range throughout the genome (median: −0.50, SE = 3.00) (Fig. 2) suggestive of signatures of population contraction (Tajima's D > 0) or expansion (Tajima's D < 0) depending on the genomic region. In contrast, the Tajima's D of continental voles was slightly positive overall with a much narrower range of individual estimates (median: 0.32, SE = 1.02; Kolmogorov–Smirnov (K–S) test, P < 2.2 × 10−16) (Fig. 2). Genetic differentiation between Orkney and continental voles reached a very high level with median FST = 0.47 (Fig. 2). The distribution of differentiation was very homogeneous along the whole genome, with only 1% of all genomic windows showing FST estimates smaller than 0.05, and 43% higher than 0.5 (Fig. 2; Supplemental Fig. S1). The windows with FST < 0.05 had very low genetic diversity both in continental voles (mean π = 2.6 × 10−4) and in Orkney voles (mean π = 2.1 × 10−4), indicating either strong purifying selection or selective sweeps in the ancestral populations. Among the 75 genes overlapping with these windows, two types of Gene Ontology (GO) biological process terms were enriched: olfactory receptor activity (15 genes) and immune response (12 genes). Computer simulations of neutral genomic data mimicking the demographic history of the system (Supplemental Fig. S2B; Wang et al. 2023) yielded a genomic landscape very similar to the empirical data (median π: Orkney = 5.7 × 10−4, continent = 4.6 × 10−3; median Tajima's D: Orkney = −0.67, SE = 1.6, continent = 0.53, SE = 0.23; median FST = 0.53), showing the effects of genetic drift on the genome.

    Figure 2.

    Genome-wide diversity and differentiation for common voles, M. arvalis, from the Orkney archipelago (brown) versus continental individuals (green). Density distributions (top) and Manhattan plots (bottom) for π, Tajima's D, and FST in 50 kb windows along the genome. Different chromosomes are marked with lighter and darker colors.

    Overall increased dN/dS in Orkney

    The distribution of nonsynonymous variation differed between Orkney voles and their continental relatives with increased dN/dS ratios for a larger number of genes in the Orkney population (Fig. 1B–D). The dN/dS ratios of roughly half of all genes (18,112 out of 36,074) were higher than zero in at least one group. For these genes, we further analyzed those 8213 genes that have a Mus musculus homolog and dN/dS > 0 in either group. The average dN/dS ratio of Orkney (0.199, SE = 0.195) was 21% higher than in continental voles (0.165, SE = 0.161; K–S test, P < 2.2 × 10−16) (Fig. 1B). The dN/dS ratio was increased for 4427 genes in Orkney voles by 0.121 (SE = 0.145) on average and decreased for 3719 genes by 0.069 (SE = 0.089) on average compared with continental voles (Fig. 1C). Both π and DXY were higher for the genes with larger dN/dS values in Orkney voles (mean π = 2.1 × 10−4, DXY = 9.0 × 10−4) than for the remainder of the genes (mean π = 1.5 × 10−4, Z-test P < 2.2 × 10−16; DXY = 7.4 × 10−4, Z-test P < 2.2 × 10−16), fitting the expectation of relaxation of selection (Tajima 1989; Dapper and Wade 2020).

    Divergent positive selection affecting the genome

    We used the density-based spatial clustering of applications with a noise (DBSCAN) algorithm (Ester et al. 1996; Hahsler et al. 2019) to identify outlier regions potentially affected by positive selection against the genomic background. The DBSCAN algorithm clusters data points in multidimensional space based on their distance between each other, and data points with long distances from the majority are outliers (Schubert et al. 2017). For each 50 kb window, we included six parameters for clustering: Δπ (πcontinent – πOrkney), ΔTajima's D (Tajima's Dcontinent – Tajima's DOrkney), ΔAF (deviation of major allele frequency, AFcontinent – AFOrkney), FST, recombination rate, and median XP-EHH. DBSCAN identified three clusters of genomic windows and 925 outlier windows (Fig. 3). The major cluster containing 95.7% of all windows represented most regions of the genome and was considered as the genomic background not strongly affected by positive selection. The two further clusters of 11 and nine genomic windows and the 925 outlier windows were then considered in the next step as potentially positively selected regions.

    Figure 3.

    The distributions of genomic parameters used for tests of positive selection with DBSCAN. Each dot refers to one 50 kb window. (A) The median XP-EHH value of the outlier windows, which measures relative haplotype homozygosity, was negatively correlated with recombination rate in Orkney voles (Pearson coefficient r = −0.33) and positively correlated with recombination rate in continental voles (Pearson coefficient r = 0.38). The dashed lines show the threshold of the highest and lowest 1% of XP-EHH. (B) Regions positively selected only in continental voles had lower Δπ and ΔTajima's D, whereas regions positively selected only in Orkney showed large variation. (C) The major allele frequency difference of the regions under a hard selective sweep showed, overall, no obvious deviation from the genomic background (K–S test, P = 0.32, Z-test P = 0.19), but mean FST values were lower than the average over the genome (Z-test P = 0.0001).

    The outliers defined by DBSCAN include any region with extreme parameters, for example, with highest recombination rates, so we used a hard filter of the first percentile on XP-EHH to determine regions that potentially went through hard selective sweeps (see Methods). There were 52 regions ranging from 50 kb to 200 kb in length within the highest first percentile of median XP-EHH and thus were considered as positively selected regions only in Orkney voles. Within the lowest first percentile of median XP-EHH suggesting positive selection only in continental voles, there were 115 regions ranging from 50 kb to 700 kb (Fig. 3). For the simulated neutral data, one false-positive selected window was detected in the Orkney population (0.1%), and nine were detected in the continental population (0.9%). However, none of the false-positive windows had a median XP-EHH value meeting the first percentile threshold that we used for the empirical data.

    HMM clustering primarily identified 1332 outliers, which included 321 (35%) of the DBSCAN outlier windows. A total of 43 primary HMM outlier windows were within the highest first percentile of median XP-EHH, overlapping with 19 windows from the DBSCAN result (26%). There were 214 HMM outlier windows within the lowest first percentile of median XP-EHH and a high overlap with the DBSCAN result (134 windows, 75%).

    In total, 164 genes orthologous to M. musculus were contained in the regions identified as under hard sweep with DBSCAN in continental voles (an example is shown in Fig. 4A), 113 of which were in the regions also detected by HMM clustering. The average dN/dS ratio of the genes detected with DBSCAN in continental voles was 0.068. Only one gene (2900026A02Rik) with unknown function had a dN/dS higher than one. Potential regulatory regions homologous to M. musculus were found in 93 out of 178 positively selected windows, including 16 promoters, 32 enhancers, and 69 open chromatin regions. The genes showed significant enrichment mainly in two types of GO terms (31 terms in total) (Supplemental Table S2): histone monoubiquitination and lipid metabolic process (Table 1). The genes belonging to the term histone H2A-K119 monoubiquitination include BMI1 and several PCGF genes and are regulated by insulin-like growth factors in mammals (Nacerddine et al. 2012). A series of cytochrome P450 genes in the second most significant term, epoxygenase P450, all of which were also detected by HMM, play an important role in lipid metabolism and affect body size in mammals by regulating and oxidizing steroid hormones (e.g., sex hormones, glucocorticoids), fatty acids, and xenobiotics (Medhora et al. 2007; Knockaert et al. 2011).

    Figure 4.

    Examples of genomic landscapes around the regions for which hard selective sweeps were detected in common voles from Orkney or from the European continent. Δπ (πcontinent – πOrkney), ΔTajima's D (Tajima's Dcontinent –Tajima's DOrkney), ΔAF (AFcontinent – AFOrkney), FST, and median XP-EHH are shown. Each point represents one 50 kb window. Positively selected windows were marked with vertical lines. Genes overlapping with positively selected regions (outlier windows determined with DBSCAN with the highest or lowest one percentile of median XP-EHH value) are given on top. (A) Positively selected regions of continental voles had low Δπ, ΔTajima's D, and ΔAF. Genes belonging to the epoxygenase P450 pathway on M. arvalis Chromosome 4 are marked with green. (B) In Orkney voles, Δπ, ΔTajima's D, and ΔAF of positively selected regions did not form clusters outstanding from the genomic background. Genes related to antigen processing and presentation on Chromosome 15 are marked with brown.

    Table 1.

    The 10 most significantly enriched GO biological process terms of genes that experienced hard sweeps in continental voles with the lowest Padj (P-value of enrichment adjusted with g:SCS method for multiple testing correction)

    In Orkney voles, there were 46 genes orthologous to mice identified with DBSCAN as under hard sweeps, and 17 of them belong to the GO term immune response (GO:0006955) (an example is shown in Fig. 4B). Among 40 significant GO terms, 33 were related to immune response (Table 2; Supplemental Table S3), including response to protozoans, MHC class II protein, and T cell activation. The average dN/dS ratio of the detected genes in Orkney voles was 0.051, and none was higher than one. Potential regulatory regions were found in 38 out of 58 positively selected windows, including six promoters, 29 enhancers, and 22 open chromatin regions. With HMM clustering, 29 genes were identified. Nineteen of these overlapped with DBSCAN results. Sixteen of them belonged to the GO term immune response, suggesting a strong signature of positive selection on these immune genes captured by both methods. We further performed GO analyses with the genes having a deviation of dN/dS > 0.3 or >0.5 between Orkney and continental voles or dN/dS > 0.5 or >1 in one of the populations. No significant enrichment was found in GO terms other than top-level categories in these analyses.

    Table 2.

    The 10 most significantly enriched GO biological process terms of genes that experienced hard sweeps only in the Orkney vole population

    Discussion

    In this study, we show with the comparative analysis of island and continental populations that differences in both genetic drift and selection have contributed to drastic differences in vole genomes after thousands of generations in isolation. The genomic patterns of diversity support a dominating role of genetic drift in the history of Orkney voles. The reduced efficacy of selection and benign environmental conditions in Orkney likely lifted some adaptive constraints of continental voles and contributed to the evolution of the island syndrome.

    Extreme demography led to extreme genomic landscape

    The complete isolation of Orkney voles after their introduction about 10,000 generations ago (Wang et al. 2023) had a lasting impact on their genomic landscape. Populations may show reduced genetic diversity for very long after an ancient bottleneck (Nei et al. 1975), and Orkney vole genomes are globally affected by much lower diversity (Fig. 2). The genomic landscape of Tajima's D strongly varied in the Orkney population (SD = 3.00) compared with continental voles (Fig. 2; Supplemental Fig. S1), which renders directly using Tajima's D as a measurement of selection unreliable in this case (Fig. 3B). The prolonged bottleneck and long isolation have left marks suggesting population contraction (leads to positive Tajima's D) and expansion (leads to negative Tajima's D) (Tajima 1989) in many regions of the genome even after thousands of generations of recovery to a currently large population (Ne ∼50,000–80,000 in the last 2000 generations) (Wang et al. 2023). It is possible that divergence between vole populations on different Orkney islands affected these patterns of Tajima's D, but our samples from the continent came from different demes as well (Heckel et al. 2005; Fischer et al. 2014). The overall negative values (−0.5) in Orkney voles and the positive correlation between π (estimation of local Ne) and Tajima's D (Supplemental Fig. S3) suggest that the ancient bottleneck had the major impact on the genomic landscape of Tajima's D (Gattepaille et al. 2013). Regions of exceptionally high differentiation have been used to identify parts of the genome under divergent selection (Beaumont 2005; Ellegren et al. 2012). However, mean FST between the Orkney and the continental population is already very high and thus limits potential signals, so that relying only on FST may produce both false-negative and false-positive results. The FST of candidate regions for hard sweeps was overall lower than that of the genomic background (Z-test, P = 0.0001) (Fig. 3C; Supplemental Fig. S1). This can be explained by the reduced genetic diversity of Orkney voles, likely together with private mutations that emerged during the isolation (Holsinger and Weir 2009; Meirmans and Hedrick 2011).

    The demographic history of the Orkney vole generated a genomic landscape in which the dominant effects of genetic drift may submerge many genomic islands of selection characterized by locally decreased genetic diversity, extreme Tajima's D, or increased FST (Hahn 2008; Li et al. 2012; Moinet et al. 2022). Some genetic parameters of candidate regions for hard sweeps overlapped with the genomic background, especially πOrkney, Tajima's D, and FST (Fig. 4B; Supplemental Fig. S1). In contrast, the candidate regions for hard sweeps in continental voles often showed typical patterns of genomic islands of positive selection, including decreased π, negative Tajima's D, and shifted allele frequencies (Nielsen et al. 2005). The large overlap between hard sweep regions in continental voles detected by the HMM and DBSCAN methods supports the general suitability of DBSCAN. The detection of fewer hard sweep regions in Orkney voles may be attributed to two factors: a true difference in selection compared with continental conspecifics (see below), and potential false-negative results caused by the population history (Grossman et al. 2010; Ferrer-Admetlla et al. 2014). Given the strong influence of the demographic history on the genome, the outlier windows detected may contain both false-positive and false-negative results.

    Our test with neutral simulations showed that DBSCAN likely has a low false-positive rate when detecting hard sweeps in Orkney voles (0.1%). The false-positive rate in continental voles was higher (0.9%), yet unlike the significant valleys in the landscape of empirical data, the genetic diversity of the false-positive windows (median: π = 4.1 × 10−3) did not show a strong decrease compared with the genomic background (median: π = 4.6 × 10−3). This supports the importance of considering multiple genetic features and tests when targeting regions under selection, and thus, our approach to focus on the consistent overlap between HMM and DBSCAN results to limit the potential impact of false positives. However, the adaptive changes identified in this study are likely only part of the picture of the adaptive evolution of Orkney voles. Higher statistical power from, for example, larger data sets or specifically targeted statistical tools, might help to overcome these limitations partially in the future. Taken together, the statistical power to detect the signatures of positive selection in specific regions of the genome is certainly impeded in Orkney voles, yet their genomic evolution is likely affected by adaptive processes.

    Genome-wide relaxation of purifying selection

    The overall increased dN/dS ratios in Orkney voles compared with continental voles suggest a lower effectiveness of selection in the population that experienced long-term isolation (Fig. 1). The increase of dN/dS in more than half of the inferred genes compared with continental voles was likely the consequence of relaxed purifying selection (Gravel 2016; Settepani et al. 2016; Kutschera et al. 2020), consistent with the accumulation of LOF mutations in Orkney (Wang et al. 2023). Both lower selective pressure from the environment (e.g., less competition on food resources, see discussion below about body size) and reduced efficacy of selection owing to strong genetic drift (Gravel 2016) are expected to play a role in the relaxation of selection. It is possible that positive selection on particular genes contributed to the global increase of dN/dS ratios, but it is unlikely to be the major driver of the pattern because of the high number of affected genes. A dN/dS ratio exceeding one is an indication of positive selection (Nielsen and Yang 2003), and positive selection can leave widespread traces over the genome (Carneiro et al. 2012; Enard et al. 2014). However, only 3% of the genes with increased dN/dS ratios in Orkney voles had a dN/dS reaching one. None of the genes detected in our test for hard sweeps had dN/dS greater than one. This indicates that selection may have affected associated noncoding regions, for example, in promoters or silencers (Andolfatto 2005; Haygood et al. 2010), consistent with the overlap of relatively many potential regulatory elements with the detected outlier regions. Furthermore, the distribution of dN/dS ratios was centered close to zero in both populations (Fig. 1B), showing the importance of purifying selection across the genome (Kimura 1977; Kryazhimskiy and Plotkin 2008).

    Relaxed selection and the island syndrome

    Relaxation from selection may have contributed to the island syndrome phenotype of Orkney voles. We detected signatures of hard sweeps related to body size only in continental voles, contrary to studies of giant island mice in which selection acted directly on growth-related genes in island populations (Payseur and Jing 2021; Renom et al. 2021). Genes related to lipid metabolism in multiple windows were detected to be under positive selection only in continental voles, which are smaller than Orkney voles. These genes include cytochrome P450 genes (CYP), which are regulated by growth factors and are related to vessel growth (Michaelis et al. 2003; Medhora et al. 2007) and obesity (Takahashi et al. 1999; Knockaert et al. 2011). Further genes may be related to early sexual maturity and stop of body growth: growth differentiation factor 1 (GDF1), which is essential for neural development (Lee 1990) and connected with obesity in mice (Onishi et al. 2016), and sterol carrier protein 2 (SCP2) and adrenodoxin (ADX), both involved in the synthesis of steroid hormone, especially sex hormone (Mendis-Handagama et al. 1992; Huang et al. 2022). Genes related to histone H2A-K119 monoubiquitination (CPLX1, PCGF3, PCGF5, PCGF6, and BMI1) were also detected under positive selection in continental voles. Changes on these genes that decrease the insulin-like growth factor signaling were found to increase lifespan in mammals (Brown-Borg 2015). However, histone H2A as part of the chromatin is evolutionarily conserved (Molaro et al. 2018) and plays a fundamental role in transcriptional repression during organism development (Tamburri et al. 2020). Thus, the hard sweep signatures on the genes involved in histone monoubiquitination are difficult to relate directly to the biological changes of Orkney voles.

    It is likely that relatively benign conditions in Orkney contributed to the acquisition of the big insular phenotype by releasing these voles from selection for small size on the continent. Based on dated archeological material from Orkney voles, the size of molars increased by >20% in the first centuries after colonization (Cucchi et al. 2014), indicating fast initial evolution of the island syndrome. It is unclear if artificial selection by ancient humans during the introduction phase may have contributed to this initial rapid change in the phenotype, but similarly large molars of common voles in independent, naturally colonized island populations (Guernsey and Yeu islands) (Cucchi et al. 2014) do not support this idea. Rather, the release from competition and low population densities during colonization of Orkney may have enabled relatively quick changes of body size.

    In a broad range of animal species, body size is negatively correlated with population density, both at population levels (e.g., freshwater invertebrates [Schmid et al. 2000] and Damaraland mole-rat [Finn et al. 2018]) and at species levels (e.g., birds [Damuth 1981] and herbivorous mammals [Juanes 1986]). European snow voles (Chionomys nivalis) are under selection for small body size and early maturity, which reduces food requirements and increases survival through winter (Bonnet et al. 2017). Invasive cane toads in Australia are typically larger at the front of the ongoing range expansion (Freeland 1986; Stuart et al. 2019). It is notable, however, that body sizes of common voles have decreased somewhat both on the continent and in Orkney in the past centuries (Cucchi et al. 2014). Ancient common voles (800–22,000 years ago) had 5% bigger molars than modern ones, showing a trend of selection toward smaller sizes with the only known exception formed by Spanish populations (Cucchi et al. 2014). Common voles in central Spain have undergone a substantial range expansion in recent decades (Luque-Larena et al. 2013). It is unknown whether this led to an increase in body size, but this would be a good opportunity to study phenotypic evolution during a range expansion of the species. The molar sizes in mainland Orkney started to decrease ∼2000 years ago (Cucchi et al. 2014), the same time when the vole population size increased and reached its highest level (Wang et al. 2023). The decrease of body size of Orkney voles in the past centuries might thus be a long-term consequence of increased competition (Rowe-Rowe and Crafford 1992; Schmid et al. 2000).

    Human-influenced evolution of immunity?

    Our analyses detected strong sweeps on genes related to antigen processing and defense to protozoans, which point to the selective pressure of pathogens on Orkney voles. We found these genes often in regions of the genome with particularly low genetic diversity (π = 1.3 × 10−4) in Orkney voles. Such low levels of locally reduced diversity after selective sweeps would take long periods to be restored (Nei et al. 1975). It is possible that these signals of selection relate to the introduction history of Orkney voles, when they were kept by Neolithic farmers during the sea journey and afterward probably as a food item (Romaniuk et al. 2016). This history has probably involved voles originating from different local populations (Wang et al. 2023) and thus might have challenged the immune system through the transmission of pathogens from different geographic origins. A straightforward explanation would be that Orkney displays a different pathogen community compared with the continental region of origin, which triggered immune system evolution. A few available studies suggest lower pathogen diversity (amoebas in rodents [Fulton and Joyner 1948] and Cryptosporidium in cattle [Morrison et al. 2008]) or infection rate (paratuberculosis in cattle [Poskanzer et al. 1980] and John Cunningham virus in humans [Beasley et al. 2011]) in Orkney, which is consistent with the isolated situation and generally low biodiversity (Essl et al. 2013). However, comprehensive comparisons of the pathogen communities in Orkney and the European continent are lacking, and thus, the quantitative and qualitative challenges for Orkney vole immunity remain currently unknown.

    On the other hand, the window with the highest median XP-EHH on Chr 15 (Fig. 4B) contains a series of MHC class II genes (e.g., H2-Aa and H2-Eb) that are known to be under balancing selection in many species (Hedrick 1998; Aguilar et al. 2004; Tollenaere et al. 2008; Barthe et al. 2022), possibly also in the common vole (Fischer et al. 2014). However, the genetic diversity of this window (π = 0.0076) in continental voles is 2.5-fold higher than the average value of (π = 0.0022) in their genomes, which is an indication of balancing selection (Charlesworth 2006). The genetic diversity of the same window in Orkney voles is close to zero (π = 4.5 × 10−5). Therefore, we cannot completely exclude the possibility that reduced efficacy of balancing selection on MHC II genes, either owing to genetic drift or to the environment in Orkney, contributed to local reduction of genetic diversity and enhanced the signature of hard sweep (Sabeti et al. 2007).

    Methods

    Calling of genetic variants

    In this study, we included 14 samples of M. arvalis from the Orkney islands and 11 from Belgium and northern and central France (referred to as continental voles). We used two samples from each island of Orkney inhabited by the voles to have an equal representation of genomes and to approximate the sample size in both groups compared. Continental samples covered the whole geographic area of the evolutionary lineage (“Western North”) that is most closely related to Orkney voles even though their ancestors were probably taken from coastal populations >5000 years ago (Martínková et al. 2013; Wang et al. 2023). We combined samples from different demes in the Western North lineage in our analyses to have a better representation of the descendants of the ancestral population at large that gave rise to the Orkney population (Martínková et al. 2013; Wang et al. 2023). Three samples of the sister species M. obscurus were used as an outgroup. All samples used in this study were collected for previous studies (Fink et al. 2004; Braaker and Heckel 2009; Martínková et al. 2013; Lischer et al. 2014; Wang et al. 2023). New genome resequencing was performed for four samples. Sequence data from all samples are available at NCBI BioProject (https://www.ncbi.nlm.nih.gov/bioproject) under accession number PRJNA963218.

    DNA extraction was performed with the phenol–chloroform method. DNA quality and concentration were checked with 1% agarose gels, a Qubit fluorometer (Invitrogen) and a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific). Sequencing libraries were produced with Illumina TruSeq DNA PCR-free library prep kit. The sequencing was done with Illumina HiSeq 2000 or NovaSeq 6000 by the NGS platform of the University of Bern. Raw reads of all individuals (Fig. 1A; Supplemental Table S1) were mapped to the reference genome of M. arvalis (BioProject ID: PRJNA737461; A Gouy, X Wang, A Kapopoulou, et al., in prep.) using Stampy 1.0.32 (Lunter and Goodson 2011). Duplicated reads were marked and filtered using GATK 4.0.10 (Poplin et al. 2017). SNP calling was performed with GATK HaplotypeCaller following the GVCF pipeline (Poplin et al. 2017).

    Individual genotypes with a read depth lower than five or higher than 100 were marked as missing sites. We used a minimum depth of five instead of a higher threshold to preserve more SNPs in samples with average mapping depth lower than 20, which has little impact of the estimation of genetic parameters (Supplemental Fig. S4). For each SNP, the overall filter parameters were as follows: QD < 15.0, SOR > 3.0, FS > 60.0, MQ < 40.0, MQRankSum < −12.5, ReadPosRankSum < −8.0, heterozygosity = 100%, and mean-depth over all individuals > 50. SNP filtering was performed with GATK VariantFiltration (Poplin et al. 2017) and VCFtools 0.1.16 (Danecek et al. 2011). Only SNPs without any missing data were kept for further analyses. Note that we did not filter monomorphic positions with low depth in the genome. This would not affect the conclusion of lower genetic diversity of the Orkney voles compared with continental voles but would cause an underestimation of genetic diversity overall (Korunes and Samuk 2021). The functional effects (synonymous or nonsynonymous) of the SNPs were annotated with SnpEff 5.0 (Cingolani et al. 2012) using the gene annotation from A Gouy, X Wang, A Kapopoulou, et al. (in prep). For further analyses of functions of genes, we focused on 13,543 for which an ortholog was identified in M. musculus (GRCm38.p6) by A Gouy, X Wang, A Kapopoulou, et al. (in prep).

    Calculation of dN/dS

    The dN/dS ratio was calculated for each annotated gene based on its longest transcript. The exon sequences were produced for each individual by replacing the corresponding sites of the reference sequence with SNP information of each individual using GATK FastaAlternateReferenceMaker (Poplin et al. 2017). Heterozygous sites were replaced with IUPAC codes. For each gene, pairwise dN/dS ratios were calculated between M. obscurus and the other samples and then averaged for Orkney or continental voles. The dN/dS calculation was performed with KaKs_Calculator 2.0 (Wang et al. 2010) using the “NG” method (Nei and Gojobori 1986). Average values were used in case of heterozygosity.

    Detection of hard sweeps

    Several genetic parameters were estimated along the genome in nonoverlapping 50 kb windows. We chose 50 kb instead of a smaller window size to have adequate numbers of SNPs in the windows given the relatively low genetic diversity of Orkney voles. With VCFtools, we calculated π and Tajima's D in 50 kb windows for both Orkney and continental voles, as well as FST between the groups. The major allele frequency (AF) was calculated per SNP and averaged for each window with BEDTools 2.28 (Quinlan and Hall 2010). For each gene, π was calculated with VCFtools, and DXY was calculated using allele frequencies (Irwin et al. 2018). The cross-population extended haplotype homozygosity test (XP-EHH) (Simonson et al. 2010) was performed for each SNP with R 3.6.3 (R Core Team 2020) using the package “rehh” (Gautier and Vitalis 2012) with default parameters. XP-EHH tests specifically for hard sweeps by comparing the haplotype homozygosity between two populations. An XP-EHH value strongly deviating from zero indicates a hard selective sweep in one group but not in the other (Simonson et al. 2010). We used a nonpolarized data set to avoid losing variants with uncertain ancestral state, as the numbers of informative SNPs in Orkney vole genomes were relatively low given their low genetic diversity. We used the data set without phasing the SNPs (for the comparision of using phased or unphased data, see Szpiech 2024). The median XP-EHH value for the SNPs in each 50 kb window was used to represent the state of the window.

    A density-based clustering algorithm DBSCAN (Schubert et al. 2017) was used to determine outlier regions potentially under positive selection. DBSCAN groups data points from genomic windows into clusters based on their distances to each other, and data points not close to any cluster of genomic windows are identified as outliers. It was used based on the assumption that genomic regions affected by strong positive selection should be rare and statistically distinct from most of the genome (Johri et al. 2022), which was generally affected by genetic drift (Wolf and Ellegren 2017). In biology, this approach has been used to detect clusters in gene expression patterns (Haygood et al. 2010), but not for selection tests with population genetic data. Parameters used for clustering were as follows: Δπ (πcontinent – πOrkney), ΔTajima's D (Tajima's Dcontinent – Tajima's DOrkney), ΔAF (AFcontinent – AFOrkney), FST, recombination rate, and median XP-EHH value. Adding the recombination rate in the clustering analysis with DBSCAN aimed to control for the potential bias of extreme absolute values of XP-EHH and other parameters owing to their genome-wide correlation with the recombination rate (see Fig. 3A; Marques et al. 2018). The clustering was performed with the R package “dbscan” (Hahsler et al. 2019). The radius of the epsilon neighborhood (eps) was set to the “knee” value (the value when a curve changes suddenly, calculated following the method of Satopaa et al. 2011) of the distribution of the kNN distances of all the data points (k = 6). The number of minimum points required for a cluster was set to seven (number of dimensions plus one) as recommended by the package. To assess the accuracy of DBSCAN for outlier detection, we simulated 50 Mb sequences for 14 Orkney and 11 continent individuals using msprime 1.3.1 (Baumdicker et al. 2021) under neutral scenarios using a demographic model (Supplemental Fig. S2A) adjusted from Wang et al. (2023). Analogous to the empirical data, the same estimations of genetic parameters and DBSCAN analyses were then performed for the simulated data to obtain a rough estimate of false-positive rates.

    For comparison, a HMM clustering method (Hofer et al. 2012) was used to identify outlier regions following the approach of Marques et al. (2018). In short, HMM compares combined P-values along the genome and detects local outlier regions containing significant P-values and different from neighbor windows, whereas DBSCAN considers windows as independent data points and uses the raw parameters directly. A Fisher's combined probability test was applied to five of the same six parameters, except recombination rate, with R package “poolr” to obtain one combined P-value for each window. Regions under selection are expected to have low P-values for genetic parameters that differ from the genomic neighborhood, and thus to have a low combined P-value (Hofer et al. 2012). The recombination rate was not used for HMM clustering because it is not expected to be affected by divergent selection, and thus, it would not contribute to identify windows under selection among neighboring genomic regions. The quantile of FST was transformed to a one-tailed P-value, and the quantile of the remaining four parameters were transformed to two-tailed P-values. Each P-value was then corrected for multiple testing with the Benjamini and Hochberg correction implemented in the R function “p.adjust.” The corrected P-values were converted to q-values (1 − p) and Z-transformed and were used as the input to assign the windows to three states using the Viterbi algorithm with R package “HiddenMarkov” following the method of Marques et al. (2016). The regions containing the windows with combined P-value < 0.01 were considered as outliers.

    To focus on the potential regions that experienced hard selective sweeps, a hard filter of median XP-EHH value was applied to the outliers detected by DBSCAN and HMM clustering. Outlier regions with the median XP-EHH value within the highest first percentile among all regions were considered potential positively selected regions only in Orkney voles. Outlier regions with the median XP-EHH value within the lowest first percentile were regarded as potential positively selected regions only in continental voles. For both categories of regions, functional enrichment analyses of GO terms were performed on the g:Profiler web server (Raudvere et al. 2019) with a P-value threshold at 0.01 for the genes falling at least 50% in the regions. We also looked for potential regulatory elements that overlapped >50% with outliers detected by DBSCAN by lifting regulatory regions of M. musculus (release 102 from from Ensembl) to the M. arvalis genome. The chain file was made with LASTZ 1.04 (Harris 2007) and UCSC Utilities (Navarro Gonzalez et al. 2021), and liftover was performed with UCSC liftOver tool (Navarro Gonzalez et al. 2021).

    Data access

    The raw sequencing data generated in this study have been submitted to the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject) under accession number PRJNA963218.

    Competing interest statement

    The authors declare no competing interests.

    Acknowledgments

    This study was supported by Swiss National Science Foundation grant 31003A_176209 to G.H. X.W. received a scholarship for PhD study from the China Scholarship Council (no. 201706380049). We thank Stephan Peischl and Stefan Strütt for their help with msprime simulations. We thank the Next Generation Sequencing Platform of the University of Bern for performing the high-throughput sequencing experiments. Part of the computations was performed on UBELIX, the HPC cluster at the University of Bern (http://www.id.unibe.ch/hpc).

    Author contributions: G.H. conceptualized the study and designed research; X.W. and G.H. performed research; X.W. analyzed data; and X.W. and G.H. wrote the paper.

    Footnotes

    • Received September 7, 2023.
    • Accepted May 14, 2024.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    References

    | Table of Contents

    Preprint Server