Divergence patterns of genic copy number variation in natural populations of the house mouse (Mus musculus domesticus) reveal three conserved genes with major population-specific expansions

Copy number variation represents a major source of genetic divergence, yet the evolutionary dynamics of genic copy number variation in natural populations during differentiation and adaptation remain unclear. We applied a read depth approach to genome resequencing data to detect copy number variants (CNVs) ≥1 kb in wild-caught mice belonging to four populations of Mus musculus domesticus. We complemented the bioinformatics analyses with experimental validation using droplet digital PCR. The specific focus of our analysis is CNVs that include complete genes, as these CNVs could be expected to contribute most directly to evolutionary divergence. In total, 1863 transcription units appear to be completely encompassed within CNVs in at least one individual when compared to the reference assembly. Further, 179 of these CNVs show population-specific copy number differences, and 325 are subject to complete deletion in multiple individuals. Among the most copy-number variable genes are three highly conserved genes that encode the splicing factor CWC22, the spindle protein SFI1, and the Holliday junction recognition protein HJURP. These genes exhibit population-specific expansion patterns that suggest involvement in local adaptations. We found that genes that overlap with large segmental duplications are generally more copy-number variable. These genes encode proteins that are relevant for environmental and behavioral interactions, such as vomeronasal and olfactory receptors, as well as major urinary proteins and several proteins of unknown function. The overall analysis shows that genic CNVs contribute more to population differentiation in mice than in humans and may promote and speed up population divergence.


Divergence patterns of genic copy number variation in natural populations of the house mouse (Mus musculus domesticus) reveal three conserved genes with major population-specific expansions
Željka Pezer, Bettina Harr, Meike Teschke, Hiba Babiker and Diethard Tautz

Supplemental Material
Supplemental Text Text S1: Reasoning for using a read-depth approach for CNV detection 3 Text S2: Digital PCR validation and estimate of false discovery rate 5 Text S3: Reduced CNV detection power due to lower read coverage 8 Text S4: Control for false positive singleton CNVs 9 Text S5: Comparisons between populations and with inbred mouse strains 10 Text S6: Assessment of CNV detection in regions of high similarity 13 Text S7: CNVR size vs. frequency 14 Text S8: Size of neural genes and its influence on enrichment at CNVRs 16 Text S9: Assessment of genotyping accuracy 17 Text S10: Validation of deleted genes analysis 17 Text S11: Validation of V ST analysis 19 Text S12: Outlier analysis 19 Figure S1: Definition of CNV classes used throughout the analysis 4 Figure S2: Validation of CNV loci by droplet digital PCR 6 Figure S3: CNV detection power vs. read coverage 8 Figure S4: Detected singleton CNVs at varying cutoffs for normalized RD signal 9 Figure  Supplemental Tables  Table S1. Read mapping and CNV discovery statistics 22 Table S2: GO enrichment analysis of genes in CNVRs overlapping SDs > 10 kb 23 Table S3: GO enrichment analysis of genes in CNVRs not overlapping SDs > 10 kb 24 Table S4: Estimated gene copy number per individual and CNV gene -separate Excel file Table S5: GO term enrichment analysis of CNV genes 25 Table S6. Frequency of deletion alleles -separate Excel file Table S7: Vst values of CNV genes -separate Excel file Table S8. Genes with significant difference in variance of copy number between populations -separate Excel file Table S9: Diploid copy number of genes in amylase cluster by individual 26 Table S10: Diploid copy number of genes in Mup cluster by individual 27 Table S11: Pseudogene-parent gene pairs used for copy number correlation analysis 28 Table S12: Assays used for ddPCR 29

References 30
Text S1: Reasoning for using a read-depth approach for CNV detection Genome-wide studies of structural variations initially employed microarray techniques but these are now rapidly becoming replaced with more powerful next-generation sequencing (NGS) platforms. Some of the advantages of NGS based approaches include much better genome coverage and resolution, more accurate copy number estimate, and capability to detect novel variants (reviewed in Zhao et al. 2013). NGS based structural variation detection methods rely on short read placement and can be classified into several different strategies, based on the use of mapping information. Each approach has certain advantages and limitations and the choice of method should include careful consideration of potential technical artifacts and how they affect data interpretation. For example, tools based on discordantly mapped paired reads (paired-end mapping; PEM) have high sensitivity and can detect balanced rearrangements such as inversions and translocations, but are at the same time strongly biased towards detection of structural variants smaller than 1 kb. Furthermore, because a sequenced library contains a distribution of insert sizes rather than one discrete value, and because PEM strategy uses a fixed cutoff at which insert length is considered to be anomalous, high false positive and/or negative discovery rates can be expected (Medvedev et al. 2009). On the other hand, read depth (RD) methods are based on a concept that the depth of coverage of a genomic region positively correlates with the copy number of that region and usually employ statistical models to deal with mapping biases and variations (Zhao et al. 2013). Unlike PEM based strategy, RD analysis is not applicable for finding copy number neutral rearrangements and novel insertions that are not already present in the reference genome. However, RD based tools detect large events with maximum sensitivity even at low coverage, and the reliability of a call actually increases with the size of the event (Medvedev et al. 2009;Abyzov et al. 2011;Zhao et al. 2013). Moreover, the RD approach can accurately predict copy numbers and thus perform genotyping, a feature that is beyond those of other NGS based methods. Another advantage of the RD based strategy is the ability to detect variation within repetitive regions of the genome by considering read placement at multiple positions, whereas other NGS based methods are restricted to unique genomic regions (Magi et al. 2012). The power to detect structural variants in low-complexity regions such as SDs is of indisputable value, given that these regions show substantial copy number variation (Sebat et al. 2004;Sharp et al 2005;Cooper et al. 2007;Egan et al. 2007;Medvedev et al. 2009).
Having all the above in mind, our choice of methodology was governed by the following objectives: 1) finding large variations which would encompass whole genes; 2) comparison of actual copy numbers in order to infer differentiating patterns; 3) detecting variants in SDs regions which are known to exhibit great deal of polymorphism in population; 4) minimizing possibility of the influence of technical artifacts on data interpretation when comparing between samples of varying sequencing coverage.
To this end, RD approach of CNV detection was chosen as the most appropriate or even the only applicable methodology. Admittedly, a combination of different approaches such as for example RD and PEM would enrich our collection of identified CNVs, however, it could also misguide the data interpretation. For example, lower sequencing coverage is associated with lower sensitivity, i.e. inability to detect much CNVs, especially those of smaller size. Hence, the events readily detected in samples of better coverage could go undetected in samples of lower resolution and misinterpreted as lacking. This could easily lead to erroneous conclusions associated with presence-absence patterns and population differentiation analysis. By focusing on larger events (≥ 1 kb) and by employing RD strategy which is much more robust to coverage differences, we attempt to avoid these issues. Furthermore, for each of the aforementioned objectives, we tackle the possibility of misinterpretation caused by technical issues by performing suitable control analyses (see further in the text). Figure S1. Definition of CNV classes used throughout the analysis. For simplicity, only duplications are depicted as CNVs in this schematic representation as peaks of read depth (green lines); however, both duplications and deletions are considered in the analysis. Arrows represent genes (transcription units). (A) "CNV" is any region ≥ 1kb with read depth differences identified by CNVnator; "genic CNV" is a call that contains at least one full gene which, in that case, is called "CNV Gene". (B) All overlapping calls across individuals merged together define a Copy Number Variable Region "CNVR".

Text S2: Digital PCR validation and estimate of false discovery rate
For ddPCR validation, we selected CNVs at different genomic regions that either represent various predicted copy number ranges across all 27 individuals or showed population differentiation in our dataset. Of the 44 considered CNVs, we were able to design specific primers and probes for 21 different loci (see suppl. Table S12). This was not surprising, given the association of copy umber polymorphism with repetitive regions which hinders amplification of specific targets. Additionally, for two genes, Luzp4 and Gm21671, we were able to design assays which contained non-specific primers. Amplification of unwanted targets was prevented by additional digestion of DNA with MseI prior to ddPCR, which cuts inside non-specific targets while leaving the desired target intact. The resulting copy number was compared with the CNVnator-determined copy number for the same region in all 27 individuals ( Figure S2).
In order to estimate false discovery rate (FDR) of our CNV call set, we used two measures of correlation between the computationally predicted and experimentally determined copy numbers: Pearson's correlation coefficient (r) and Lin's Concordance Correlation Coefficient (CCC) ( Figure S2). The latter is more stringent in that it measures departure from the equality line (45°) between CNVnator predictions and ddPCR results (Lin 1989). The correlation was considered to be very strong if either r or CCC was > 0.7. Those cases where correlation coefficient was calculated to be below 0.7 were considered to be false positives. Based on Pearson's r, we find two false positive CNV loci among 23 tested (Defb8 and Nxpe5), resulting in FDR of 8.6%. Based on CCC, we detect five false positives (Defb8, Gm13152-13154, Gzma, Nxpe5 and Tex24), estimating the FDR to be 21.7%. The results of ddPCR runs (y axis) were plotted against copy numbers predicted with CNVnator (x axis) for the amplicon regions. Fitted regression lines are shown dashed; r is Pearson's correlation coefficient; CCC is Lin's Concordance Correlation Coefficient.

Text S3: Reduced CNV detection power due to lower read coverage
Neighboring CNV calls can appear as one single call in samples with shallower coverage, resulting in a lower proportion of smaller CNVs. Indeed, on average, 79%-81% of calls were 1-10 kb long in the three mainland populations, as opposed to only 69% in HEL ( Figure S3A). This effect is also visible from the calculated average CNV length, which for HEL population is substantially larger than for mainland populations (17 kb compared to 11-12 kb, respectively; Table S1 below). When only genic CNVs or CNV genes are considered, which are mostly much larger than 1 kb, the difference of HEL to FRA and GER is also smaller (Figure 1 in main text; Table S1 below).
To assess the effect of read depth on CNV counts, we plotted the number of mapped reads in each sample against the number of detected CNVs. The correlation was strong for all CNV counts ( This indicates that the read depth dependence is of less relevance when assessing CNVs associated with genes. Therefore, we reasoned that by focusing on CNV genes and larger events our analyses should be less influenced by technical artifacts and differences in sequencing depth.

Text S4: Control for false positive singleton CNVs
CNVnator calls deletions and duplications when the normalized RD signal is calculated to be below 1.5 (per diploid) and above 2.5, respectively. The more stringent cutoff of 1.4 and 2.6, has been used to find reliable de novo CNVs from family trios (Abyzov et al. 2011). Since CNV calling was performed for 27 genomes, to account for multiple comparisons, we tested whether the number of detected singletons would change if the more strict criteria for CNV calling are applied. We varied the normalized RD cutoff, increasingly in its stringency: 1) 1.4 for deletions and 2.6 for duplications; 2) 1.2 and 2.8; and 3) 1.0 and 3.0. For each of the three cutoff sets, we counted the number of detected singletons in each animal and compared their distributions with the original data (Table S1). We detect only borderline significant difference in comparison with the most stringent cutoff applied ( Figure  S4; p = 0.0485 after Bonferroni correction in Dunn's post hoc test), indicating that the original data contains bona fide singletons in our sample set, rather than false positives.

Figure S4. Detected singleton CNVs at varying cutoffs for normalized RD signal applied.
Comparison of original dataset with singletons obtained by applying three different cutoffs to call CNVs is shown (see Text S4 for cutoff values). Number of detected singletons for all 27 animals is shown in each box as distribution. Low-significance difference between the original dataset and dataset obtained by applying cutoff 3 was detected and its p-adj value is shown (Kruskal-Wallis rank sum test p = 0.07).
We detected on average 386 singletons in our sample set (suppl. Table S1), i.e. CNV calls found exclusively in a single individual. Considerably more singletons in all IRA mice compared to others is likely due to higher effective population size, consistent with more mutations at low frequency.
We detected on average 180 (median 178, sd = 45) deletions per sample for which the number of mapped reads corresponded to less than 1/100th of a single copy per genome (copy number -CN = 0.01) and can be considered complete deletions of high confidence. In 84 of these on average (81 median, sd = 26) we find no reads aligned at all. If we take CN of 0.5 as the cutoff below which a region is considered absent, over 1% of the genome appears to be deleted compared to the reference assembly. Given the limitations of the read-depth approach, we were not able to identify regions which are present in our samples and absent from the reference genome. However, if a similar outcome applies for vice-versa comparison, the estimated difference in genomic content between the reference assembly and any of our wild mouse samples might exceed 2% of the genome fraction.

Text S5: Comparisons between populations and with inbred mouse strains
Analysis of CNV presence-absence patterns can capture genetic relationships between individuals and popualtions. We defined overlapping CNVs between two individuals as those that intersect by at least 50% of the sequence length (based on position in the reference genome) and counted them in all combinations of comparison between the 27 individuals of the wild populations and for inbred mouse strains. We did not consider calls on the Y chromosome to be able to compare male to female samples.
In the wild mouse samples, two individuals had on average between 2,025-2,759 overlapping CNVs in their respective population (Supplemental Table S1). In each pairwise comparison, the similarity was calculated as the number of overlapping CNVs divided by the average number of detected calls between the two individuals and the resulting similarity matrix is shown in Fig. S5. Based on the number of overlapping CNVs, FRA and GER populations are the most similar to one another. HEL mice share the least number of CNVs with the three other populations, although this could partly be ascribed to the lower number of detected calls. We also observed different degrees of variance in the number of overlapping CNVs within populations. GER mice are more similar to one another than FRA mice, and the IRA population shows the highest diversity. To estimate the extent to which CNVs from our wild mice samples overlap with those in inbred mice, we downloaded variant calls from ftp://ftp-mouse.sanger.ac.uk/REL-1302-SV/ for 16 mouse strains from the study by Keane et al. (2011;dbVar accession number estd118) and additional strain FVB/NJ from the study by Wong et al. (2012; dbVar accession number estd200). We opted for this release as all calls are relative to the Build 37 (mm9) reference mouse genome C57BL/6J -the same assembly we used for CNV calling. In order to compare it with our set of CNVs, we removed balanced structural variations (SVs) and calls < 1 kb from each of the strains' SV set. Calls on Y chromosomes were not considered to enable comparisons between male and female samples. By using BEDTools (Quinlan and Hall 2010) we intersected calls from each of the 17 inbred strains with calls from each of our 27 wild mice, creating two analysis sets of overlapping calls: those that intersect by at least 1 bp and those that have minimum 50% reciprocal overlap of the reported CNV length with respect to the reference. On average, the inbred strains and wild mice overlapped at 1,679 CNVs (median 1,631) when the minimal intersection of 1 bp was required ( Figure S6A -left panel). By that criterion, the largest number of overlapping calls with wild mice was found in strains SPRET/EiJ, CAST/EiJ and PWK/PhJ which are derived from M. m. spretus, M. m. castaneus and M. m. musculus subspecies. However, when we normalized the number of overlapping CNVs by dividing it by the average number of detected calls between the two compared individuals, the three strains showed the least similarity to our wild mice samples ( Figure S6B -top panel). This is in agreement with their distance to M. m. domesticus, and the highest number of overlapping calls with our samples can be explained by the significantly more SVs detected in these strains compared to other strains (Keane et al. 2011). When we applied 50% reciprocal overlap cutoff, wild mouse samples shared on average 998 (median 999) CNVs with inbred strains ( Figure S6A -right panel). They shared the largest number of CNVs with WSB/EiJ strain (average 1,337; median 1,471) and also showed highest similarity to it ( Figure  S6B -lower panel). This is expected, given that the WSB/EiJ is wild derived strain of M. m. domesticus subspecies. Overall, most comparisons show unsurprisingly higher similarity to M. m. domesticus derived strains, especially to WSB/EiJ and FVB/NJ ( Figure S6B).

Text S6: Assessment of CNV detection in regions of high similarity
As a mean of accomplishing uniform depth of coverage across genome, CNVnator keeps reads which can be aligned to multiple locations. This also enables it to detect CNVs in repetitive regions (except for transposable elements -see explanation in Abyzov et al. 2011). In case of paired-end data, CNVnator additionally exploits the information about ends distance and orientation to improve read placement. Moreover, by calculating the fraction of ambiguously mapped reads with zero mapping quality in the called CNV region, CNVnator discards unlikely calls (Abyzov et al. 2011). In order to assess the reliability of calls in repetitive regions, we tested the correlation of genotyped copy numbers of segmental duplications which are annotated at two genomic locations and share over 90% of sequence identity. If each non-uniquely mapping read is simply randomly placed at one of the two possible locations, then it is expected that the two regions to which these reads are mappable will have similar read depth and thus have the same estimated copy number. We genotyped in total 5,517 large SDs (≥10 kb) that are completely encompassed within CNV calls in one individual and compared their copy numbers at alternative locations ( Figure S7A). We observe only weak correlation (r = 0.29) between the two locations' copy number, and the large majority of highly discordant values strongly deviated from line of equality (y = x). This shows that CNVnator is able to distinguish similar genomic regions such as SDs.
Given that CNV calling in SDs depends on sequence similarity and can be influenced by sequencing quality, potential artifacts could lead to erroneous conclusions. In our samples, on average 16% (minimum 14%; maximum 19%) of all CNVs overlapped SDs with 98% or more sequence identity. Proportion of such CNVs which also overlapped any part of a gene ranged between 4,5% and 7,2% (5,5% on average). Given these substantial fractions, we tested whether our major findings change when CNVs intersecting highly similar SDs (with ≥ 98% sequence identity) are excluded from analysis. With such data, we repeated the analysis of CNV frequency like the one presented in Figure 2A, and compared the distribution between the original data and data depleted of CNVs intersecting highly similar SDs. We find no significant differences between the two in either overlapping SDs set (Wilcoxon rank sum test; p = 0.576) or non-overlapping SDs set (Wilcoxon rank sum test; p = 0.982). We have also tested how the exclusion of CNV genes intersecting highly similar SDs influences observed patterns of genetic relationship shown in Figure 4. We calculated the Euclidean distance matrix from standardized copy numbers of 1,104 CNV genes which did not intersect highly similar SDs, and compared it to the distance matrix calculated from the original CNV gene dataset by using Mantel test in R package "vegan". Mantel statistics based on Pearson's product-moment correlation revealed strong and significant correlation between the two datasets (r = 0.895; significance = 0.001). These analyses show that although highly similar SDs overlap considerable proportion of CNVs in our dataset, overall conclusions are not changed when they are excluded from analysis, suggesting no major artifacts caused by sequence similarity.
Similarly, potential mis-mapping of the reads could distort CNV calling at pseudogenes and their functional paralogs (parent genes). We picked out seven pseudogenes from our CNV genes list that had one-to-one relationship with the corresponding parent gene, such that, for each case there were no additional paralogs annotated in the RefSeqGene list (see suppl. Table S11 for a list pseudogenes and related parent genes). In all individuals where these pseudogenes were found completely inside CNV, we genotyped them and their related parent genes. There was no correlation between the copy numbers of pseudogenes with their parent genes, i.e. the parent gene had two copies regardless of the pseudogene copy number ( Figure S7B). This again shows that, despite high sequence similarity, CNVnator is able to discern different genomic locations.

Figure S7. Assessment of CNV detection in regions of high similarity. (A)
In total 5,517 large SDs were genotyped in one randomly chosen individual (15B). Each dot represents copy numbers of the two annotated genomic locations per segmental duplication, plotted against each other. Weak correlation and the best fit line strongly deviating from identity line (y=x) illustrate CNVnator's power to discern regions of high similarity. (B) No correlation between copy number of pseudogenes and corresponding parent genes was observed. r is Pearson's correlation coefficient.
Text S7: CNVR size vs. frequency CNV size and presence in population depends significantly on their overlap with large SDs: CNVs in regions not overlapping SDs are generally smaller and less frequent (Figure 2, main text). This could indicate relatively stronger selective constraints on such CNVs but could also simply reflect size distribution, i.e. larger CNVs are more likely to overlap between individuals. To test if the latter is the case, we plotted the CNVR size against their presence in individuals ( Figure S8). We found only weak correlation between the two (Pearson's r = 0.24; p < 2.2e -16 ), indicating that the observed CNVR frequency is not a technical artifact reflecting the size distribution.  Overlapping calls from all individuals were merged into CNVRs and analyzed separately based on their intersection with SDs > 10 kb. Only CNVRs that overlapped at least one gene by any number of nucleotides were considered. Number of individuals with CNV call within each CNVR was counted. There were in total 662 unique CNVRs overlapping SDs (blue) and 7,536 CNVRs not overlapping SDs (red). The graph shows frequencies of CNVR presence across all samples.

Text S8: Size of neural genes and its influence on enrichment at CNVRs
In our set of 28,375 CNVRs that did not overlap large SDs we observe substantial enrichment for terms associated with neurological functions, such as synaptic transmission, nervous system development, learning or memory etc. (Table S3). We asked whether this enrichment was simply a consequence of the large size of genes that are related to these terms.
For GO Term IDs related to neurological functions that were significantly overrepresented in our dataset (Table S3;  We retained 2,924 non-reduntant gene coordinates which were relevant for mm9 RefSeq gene set. Compared to the rest of the RefSeq genes, these neurologically associated genes are more than twice as large (median 30,9 kbp compared to 14,4 kbp; on average 87,2 kbp compared to 38,4 kbp; Kolmogorov-Smirnov test: p < 2.2e -16 ), similarly to what was previously reported for human genes (Raychaudhuri et al. 2010). We find 26% of them to be overlapped by CNVRs (756/2,924), as opposed to 18% of genes with other functions (4,245/23,832).
To test if arbitrarily positioned genomic fragments preferentially overlap genes with neurological functions, we randomly placed 28,375 non-overlapping segments of matching size as CNVRs in real data, making sure that, as in real data, they are outside of annotated gaps and SDs > 10 kb. We created 10,000 such sets of permuted CNVRs and in each counted the number of neurologically associated genes and other genes overlapped by simulated CNVRs. On average, much larger fraction in both gene groups was affected in simulated data than in observed data: about 45% (1,314/2,924) of neurologically associated genes and 30% (7,135/23,832) of other genes. The fact that fewer genes are overlapped by CNVRs in real data versus in simulated data, suggests that CNVs are biased away from genes, as shown previously (Conrad et al. 2006;Redon et al. 2006). When comparing the affected fractions of neurologically associated genes to other genes, we find that the ratio between the two is significantly smaller in real data (1.44 or 26% to 18%) than in simulated data (1.5 on average or 45% to 30%), indicating that neurologically associated genes are affected by copy number variation less than it is expected by chance. We calculate that the probability of obtaining the ratio of 1.44 or smaller is 0.0087 (87 in 10,000 simulations; Figure S10). This suggests that, despite their substantially larger size, genes associated with neurological functions are less likely to be overlapped by CNVRs than it is expected by chance, even when the bias against copy number variation in genes is taken into account. Figure S10. Distribution of calculated ratios for fractions of neurologically functioning genes to other genes that are overlapped by CNVRs in simulated data. Calculated ratios are shown for all 10,000 simulations. Red arrow points to the ratio value in real data.

Text S9: Assessment of genotyping accuracy
To assess the extent to which genotyping accuracy is affected by depth of coverage, we performed a sequential down sampling of reads on one Heligoland individual (HG_08). By using SAMtools (Li et al. 2009) and a custom perl script, we randomly extracted reads from the original sequence file (327,866,406 placed reads in total) in separate subsets of 10%-90% of the reads. In each subset, CNVs were called and CNV genes and their CN determined as described in Methods. The down-sampling experiment was conducted three times independently, each time comparing genotyped CNs between the original file and each subsampled set. We found that calls start to become less reliable only below 100 million reads ( Figure S11). Given that our individuals have 3 to 7-times this critical number of reads, we conclude that the gene content and polymorphism analysis is robust and that the genotyping accuracy does not diminish with reduced coverage. Figure S11. Genotyping accuracy of CNV genes is not drastically affected by depth of coverage. 327,866,406 reads from the original sample were sequentially downsampled by 10%. Inferred copy numbers were compared between each subsample and the full sample. Computed Pearson's correlation coefficients for each comparison are plotted against each subset of reads including the original set. Three independent down-sampling experiments are shown as different lines. n is the number of CNV genes detected and genotyped.

Text S10: Validation of deleted genes analysis
In order to test if the amount of lost genes in HEL population is significantly higher than in mainland populations, we need to account for the smaller sample size. To this end, we created all possible combinations (56) of only three individuals in each mainland population. In each combination we counted the number of autosomal genes which appear to be deleted in all three individuals (normalized RD < 0.5 per diploid). In all three populations, number of deleted genes was considerably higher when only three individuals were considered instead of all eight (median 8.5, 27.5 and 34.5 versus 1, 6 and 16; see Figure S12 and main text); nevertheless, it was still much lower than in HEL population ( Figure S12).

Text S11: Validation of V ST analysis
To test how the difference in sample size between Heligoland mice (n=3) and the other three populations (n=8 per population) influences the results of V ST statistics (Table S7), we repeated the analysis on subsets of mainland samples. For all possible combinations (56) made of just three individuals from one population with all eight individuals from the other, mean V ST values were calculated and compared across all population pairs and combination groups. In all resulting comparisons, the mean V ST was on average significantly lower than in any comparison with the HEL sample ( Figure S14). In addition, the values were similar to the original eight-to-eight comparisons, providing strong support for the observed higher differentiation of the HEL population.

Text S12: Outlier analysis
In order to detect genes with particularly large average copy number differences between populations, we performed pairwise comparisons of average copy number per population for all 1,863 CNV genes ( Figure S15A). Outliers were identified as points that are at least 4 standard deviations (equivalent to p = 0.001) away from the best fit line of the resulting distribution. For the comparisons with HEL mice, the distribution was too spread out to detect any outliers (not shown). In the other three comparisons we detected in total 15 outliers (red dots in Figure S15A) corresponding to 13 genes ( Figure S15B). Many of these are polymorphic within populations, i.e. individuals of the population can contain very different numbers of such genes.     Table S3. GO term enrichment analysis of genes in CNVRs not overlapping SDs > 10 kb