Supplemental File S1. This file contains the results of running PLIGHT on two sets of simulated contamination cases, for multiple mutation rates: (A) The contaminating SNPs are drawn from individuals taken from the general 1000 Genomes population, and not necessarily from the same population as the primary individual; (B) The contaminating SNPs are drawn from individuals taken from the same population (CDX) as the primary individual. The procedure of generating the query sets is as follows. We (a) select five individuals; (b) draw 40 SNPs from a single chromosome for each of them (reference set of SNPs); (c) designate one individual as the primary “target” of the attack; (d) with a varying probability (ranging from 0 to 0.7), we randomly replace each SNP of the target individual with a SNP from one of the other four individuals (chosen with equal probability = 0.25); (e) using this “contaminated” SNP set, iteratively remove one SNP and run the inference procedure to determine if we can uniquely and correctly the target individual; (f) record the minimum number of SNPs allowing unique and correct identification. We run 3 iterations over the choice of individuals and associated reference SNP set, and 10 iterations where the contaminated set is randomly generated anew from each such reference set. We then apply PLIGHT_InRef to the query sets. The metrics considered are (i) the minimum number of SNPs required to correctly identify the target individual; (ii) the total number of such successful target identifications out of 30 (when even 40 SNPs are not enough to identify the individual, the identification process is a failure); (iii) the per-SNP difference between the logarithm of the joint probability under the HMM model and the logarithm of the joint probability using a simple product of the genotype frequencies of the SNPs; (iv) the average and standard deviation of the minimum number of SNPs from (i); and (v) the average and standard deviation of the per-SNP difference in logarithms from (iii). For (i) and (iii), we concatenated the results from 3 runs of 10 iterations each to yield lists of length 30. ============================== A. General background population: ============================== ============================== Mutation rate = 0.0 ============================== Minimum number of SNPs for correct ID = 6,6,6,10,6,7,8,6,5,8,8,7,7,8,7,7,6,7,6,8,4,7,7,8,7,8,10,9,6,5 Total number of successful IDs = 30/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.7137894359217126,0.25896261705081375,-0.09555243035469656,1.0493923197656647,0.41906517212131106,0.8182768633045543,0.7684583932551956,0.5749413815040851,0.21808054377983516,0.668711409506,0.39984466793069284,0.42342900067164685,0.0807480235431383,0.5620403567467229,0.3021692335979349,0.024573456949163765,-0.05199899306994323,0.14103647125722954,0.6043555162723093,0.04351947189836802,0.4437906617183178,0.2911628401879082,0.47020860097299383,0.12174056487987817,0.12002937511859965,0.29328743897470844,0.3535765717066385,0.24967212443782413,0.32896856042687705,0.1897574740045343 Average +/- standard deviation of minimum number of SNPs = 7.0 +/- 1.3416407864998738 Average +/- standard deviation of difference in probabilities = 0.35953457080266726 +/- 0.2685045857690805 ============================== Mutation rate = 0.1 ============================== Minimum number of SNPs for correct ID = 6,6,9,11,11,15,6,6,8,7,5,11,8,12,7,10,7,6,8,6,10,7,7,10,5,5,9,8,10,10 Total number of successful IDs = 30/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.9743292434810377,0.2804027651661077,0.23858021367049195,0.6105357018065326,0.398561998321386,0.4295316310712392,0.27531323631786336,0.488236128348645,0.4367525615497325,0.532866383359254,0.3568355736130169,0.10634372619289904,0.45056536225246013,0.3300362992446441,0.19609693733835293,0.2973560819845714,0.41546835807671706,0.41827654136719056,0.2984773092071017,0.19817968397708277,0.1897285092961166,0.18314211939879282,0.08827343425740189,0.2176341317772618,0.6535466075027205,0.1170450093930647,0.10233978432777174,0.22124011224888118,0.014481066916695973,0.47437628446733393 Average +/- standard deviation of minimum number of SNPs = 8.2 +/- 2.372059583287627 Average +/- standard deviation of difference in probabilities = 0.33315175986441226 +/- 0.196951532099614 ============================== Mutation rate = 0.2 ============================== Minimum number of SNPs for correct ID = 16,19,13,12,17,23,23,18,5,17,5,23,8,14,6,17,18,15,14,8,5,10,18,20,16,21,4,15,7,11 Total number of successful IDs = 30/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.5644005835983696,0.5843883326812508,0.5953667122213561,0.5495066115942909,0.5535175816980746,0.4275178715967049,0.29127653969747447,0.5897861404496241,0.7154333109906748,0.5757049227670018,0.38270606067420465,0.29303125605393926,0.4634700855911471,0.39573380319449836,0.5322003836066952,0.6200929485861079,0.5152827390229177,0.35431794421038976,0.3883603155355471,0.20382654870771488,0.5990181608455197,0.3036921649626693,0.30794998499368176,0.24901767742355005,0.3516615304835313,0.2031729077990099,0.4435650611119788,0.16107589755710616,0.5329860604175839,0.3724454449559371 Average +/- standard deviation of minimum number of SNPs = 13.933333333333334 +/- 5.761558431149991 Average +/- standard deviation of difference in probabilities = 0.4373501861009517 +/- 0.143363185853405 ============================== Mutation rate = 0.3 ============================== Minimum number of SNPs for correct ID = 19,14,37,18,10,23,7,12,28,21,26,6,3,31,9,12,7,16,17,20 Total number of successful IDs = 20/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.5421333344680539,0.6282729332701403,0.3877654673697554,0.6329612271363918,0.3499927968902442,0.36991990993538554,0.6442879799371892,0.513751855880495,0.28223483414256834,0.2592732332399915,0.4126078751054794,0.44158331781815513,1.1478258005644704,0.2405469264321591,0.4317811016102632,0.517016160130536,0.25093667327914176,0.483876623239199,0.307432291322154,0.34829659632852544 Average +/- standard deviation of minimum number of SNPs = 16.8 +/- 8.812491134747313 Average +/- standard deviation of difference in probabilities = 0.459624846905015 +/- 0.20088301637990302 ============================== Mutation rate = 0.4 ============================== Minimum number of SNPs for correct ID = 9,10,25,11 Total number of successful IDs = 4/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.9523369513729004,0.43858738346954224,0.39148477085485056,0.7508636590789964 Average +/- standard deviation of minimum number of SNPs = 13.75 +/- 6.53356717268599 Average +/- standard deviation of difference in probabilities = 0.6333181911940724 +/- 0.2302137034563413 ============================== Mutation rate = 0.5 ============================== Minimum number of SNPs for correct ID = Total number of successful IDs = 0/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = ============================== Mutation rate = 0.6 ============================== Minimum number of SNPs for correct ID = 30,36,25,24,14,32 Total number of successful IDs = 6/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = -0.14178569050268947,-0.2354202095964,-0.24807795727890591,-0.3566899094494853,-0.07763245974958581,-0.14533781705121374 Average +/- standard deviation of minimum number of SNPs = 26.833333333333332 +/- 7.033649282003064 Average +/- standard deviation of difference in probabilities = -0.20082400727138006 +/- 0.09079685200239174 ============================== Mutation rate = 0.7 ============================== Minimum number of SNPs for correct ID = 24,17,26,38,14,29,31,25,32,16,29,9,32,13,20,19,15,36,10,25,25,31 Total number of successful IDs = 22/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = -0.6656029675792355,-0.45559354980317235,-0.5418896216704471,-0.34872569908805034,-0.768416162587333,-0.7053435743224402,-0.4605556894897708,-0.34456960042812257,-0.5725082944038824,-0.22626484004553693,-0.12621215842997974,-0.300631423900522,-0.20499822689742642,0.11099645054586436,0.0011361556083683056,-0.3340059998594855,-0.5058436042786897,-0.11234225028658192,-0.46366510072531303,-0.024878779232639658,-0.5460954086588766,-0.15559206610321166 Average +/- standard deviation of minimum number of SNPs = 23.454545454545453 +/- 8.23919277117978 Average +/- standard deviation of difference in probabilities = -0.3523455641652947 +/- 0.23462975888146748 ============================== B. CDX background population: ============================== ============================== Mutation rate = 0.0 ============================== Minimum number of SNPs for correct ID = 6,7,9,8,7,7,7,7,5,5,7,11,8,6,10,5,7,7,7,7,8,7,7,8,10,8,8,7,6,6 Total number of successful IDs = 30/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = -0.08964423847204846,0.2342222999781267,0.6302454019182426,0.15744429077404432,0.10206458531705895,0.159398973905047,0.18614227085790727,0.16402928390785426,-0.07394955181653576,0.05843997342344416,0.362462734946392,0.4904701233543554,0.18253173065529382,0.1961388300457617,0.8326088789293367,0.12801867339917053,0.10958223435581793,0.16584317688849964,0.39587596134375297,0.3874987664571525,0.28010396272972504,0.24221583545828068,0.11150116513863827,0.1259275814592431,0.5306053451498162,0.08443510460434678,0.33247884283223583,0.2514544732907914,0.03898315191281002,-0.08844482624384516 Average +/- standard deviation of minimum number of SNPs = 7.266666666666667 +/- 1.3888444437333105 Average +/- standard deviation of difference in probabilities = 0.22295616788335723 +/- 0.20304483878341525 ============================== Mutation rate = 0.1 ============================== Minimum number of SNPs for correct ID = 7,8,8,5,7,9,7,12,5,6,9,8,12,7,5,5,7,10,6,12,6,6,6,7,10,7,5,9,7,8 Total number of successful IDs = 30/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.2587474202576968,0.4542475943291062,0.1534078453468395,0.6165945438242867,0.2598376958306717,0.17400357902707458,0.19505962816094982,0.16495416146367234,0.3872791437914291,0.2421230785747186,0.3744144113711503,0.3606382324103574,0.3114484648150757,0.16213095442653355,0.32825190718666325,0.31844915353305814,0.5572963739363639,0.5445549304905614,0.2729028593829132,0.20343579165283718,0.1669713856106753,0.35000225448150485,0.26393044453604153,0.21333965125944396,0.17707591636243017,0.20891428163745068,0.3120493068378895,0.2310837398495984,0.32961634313205185,0.020757992334971664 Average +/- standard deviation of minimum number of SNPs = 7.533333333333333 +/- 2.045048220023729 Average +/- standard deviation of difference in probabilities = 0.2871173028618006 +/- 0.12931408665514946 ============================== Mutation rate = 0.2 ============================== Minimum number of SNPs for correct ID = 8,32,7,14,11,7,7,7,7,6,6,16,6,11,13,14,7,11,7,11,16,16,4,20,16,6,13,7,10,9 Total number of successful IDs = 30/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.26146657956217223,0.36390478661597636,0.42314395342269534,0.3172014505955705,0.3301963735959044,0.25905863472106944,0.28987328777841714,0.2522766235854486,0.37615342805734336,0.3075213948904992,0.45310290439979406,0.19975587297890696,0.26789601483276054,0.2807849758270329,0.5189748475369753,0.3649528915625882,0.2828307071353096,0.08098936945828651,0.43665993604558473,0.15665976889907848,0.1555051571614663,0.21039128270861474,0.8829452117419463,0.19058732992492916,0.14246793786152778,0.5604072634200697,0.3181895242917134,0.3215800186721343,0.16964905119085696,0.3181221902316178 Average +/- standard deviation of minimum number of SNPs = 10.833333333333334 +/- 5.592157206501104 Average +/- standard deviation of difference in probabilities = 0.31644162562354305 +/- 0.15162821400873486 ============================== Mutation rate = 0.3 ============================== Minimum number of SNPs for correct ID = 11,12,17,22,34,20,21,8,7,15,11,11,6,22,12,14,7,19,11,4,14,9,34,17,11,17 Total number of successful IDs = 26/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.3126135013248632,0.38953318814293975,0.44149682684963476,0.2729146623546349,0.2528836751193629,0.2825943985866367,0.2885877427709512,0.367044393036152,0.43151430143690866,0.41489726760246853,0.3309696166339823,0.3172239493278006,0.7409405952547462,0.4359010297968842,0.36649382602677677,0.4124750480882731,0.49642506799079517,0.20180084287057684,0.5515767589151349,0.7211415030247683,0.2474309233953698,0.3215183933235155,0.27362917857175134,0.3049858530726046,0.42664776163598167,0.46986179817426765 Average +/- standard deviation of minimum number of SNPs = 14.846153846153847 +/- 7.39902440394526 Average +/- standard deviation of difference in probabilities = 0.3874270039741455 +/- 0.12961291666919833 ============================== Mutation rate = 0.4 ============================== Minimum number of SNPs for correct ID = 11,9,14,20,10,8,9 Total number of successful IDs = 7/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = 0.42663227079286775,0.5003444542852171,0.3090461552201141,0.4748742584824866,0.26526403166389834,0.4950827837224605,0.38520403097561506 Average +/- standard deviation of minimum number of SNPs = 11.571428571428571 +/- 3.8861344310672696 Average +/- standard deviation of difference in probabilities = 0.40806399787752273 +/- 0.08586973939619855 ============================== Mutation rate = 0.5 ============================== Minimum number of SNPs for correct ID = Total number of successful IDs = 0/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = ============================== Mutation rate = 0.6 ============================== Minimum number of SNPs for correct ID = Total number of successful IDs = 0/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = ============================== Mutation rate = 0.7 ============================== Minimum number of SNPs for correct ID = 36 Total number of successful IDs = 1/30 Per-SNP log(Joint Probability under HMM) - log(Product of Genotype Frequencies) = -0.07411730503046678 Average +/- standard deviation of minimum number of SNPs = 36.0 +/- 0.0 Average +/- standard deviation of difference in probabilities = -0.07411730503046678 +/- 0.0