This code is provided as Supplemental code to the paper "Minimal Positional Substring Cover Is a Haplotype Threading Alternative to Li & Stephens Model" by Ahsan Sanaullah, Deguiz Zhi, and Shaojie Zhang.

The code available here is provided so that an interested reader may view the code in order to see how the benchmarks were performed. Furthermore, they may aid in the implementation of the algorithms in the paper. This code is designed to run on a specific input file (cleaned UKB Chr 21 array data) and is not general purpose. It will not run correctly on other vcf files. 


Note that making this code general purpose is relatively simple. It would require the modification of the input obtaining function to be more robust and the use of dynamic instead of static arrays. Lastly, some optional parameters would be added and some dynamically allocated arrays cleaned up. Another option is to count the number of haplotypes and number of sites in the input VCF and change the corresponding values (N and M respectively).
    
   
MPSC Size Distribution 
  
 
For subsets of the panel of sizes in {10, 100, 1000, 10000, 100000, and 860022}, MPSC.cpp outputs the size of the MPSC of all haplotypes in the subset vs the rest of the haplotypes in the subset. 


Solution Space Size and Length Maximal MPSC Length


setMaxSolutionSpaceBenchmark.cpp outputs the set maximal only MPSC solution space size and the Length Maximal MPSC Length for every haplotype in the panel using the rest of the panel as the reference panel. 


Length Maximal MPSC Run Time


These codes output the run time of finding an out of panel Length Maximal MPSC of 1000 random haplotype by varying number of sites (lengthMaximalMPSCbenchmarkByN.cpp) and varying number of haplotypes (lengthMaximalMPSCbenchmarkByM.cpp) in the panel.


Imputation Benchmark


These codes impute thinned haplotypes of a panel with a naive imputation method using the MPSC formulation of haplotype threading and output the sites imputed correctly, sites imputed incorrectly, and sites not imputed for each imputation method. View the code in the imputation directory. One program uses only the original panel, while the other uses a "P-Smoothed" panel as well.
