DSM enables linking on massive candidate genotype sets and provides a measure of identifying signals from gene expression data. We included an additional 21,996 HRC individuals in the candidate genotype set to evaluate linking accuracy on massive data sets. (A) Combining all four chromosomes, DSM links 94% of individuals. Each curve represents an average across 10 random permutations of the HRC individuals. (B) DSM's substantial accuracy improvement is observed for each chromosome, as depicted here for Chromosome 20. Plots for other chromosomes are provided in Supplemental Figure S3. (C) A strong correlation is observed between the empirical P-values of true matches and our model-based estimates on a random subset of 500 individuals, suggesting that our P-values provide a calibrated measure for assessing our method's accuracy on larger candidate sets. The minimum empirical P-value (1/22,287), or equivalently maximum negative log P-value, is highlighted with the dashed gray line.
