
Number of variants identified in this data set compared with commonly used existing data sets as a function of allele frequency. The number of variants on a log scale is plotted by minor allele frequency bin within the harmonized HGDP + 1kGP data set. We show variants found in the harmonized HGDP + 1kGP data set only (red), variants shared between the harmonized data set and each comparison data set (purple), and variants that are only found in each comparison data set (blue). More information on exact numbers and comparisons by QC within and across data sets can be found in Supplemental Tables S11 and S12.











