This code is provided as Supplemental code to the paper "Minimal Positional Substring Cover Is a Haplotype Threading Alternative to Li & Stephens Model" by Ahsan Sanaullah, Deguiz Zhi, and Shaojie Zhang.

This program takes VCF files as input. The input VCF files must be in the format specified in aboutInput.txt. It takes as input two VCF files, one VCF file stores a reference panel. The other stores a set of haplotypes to thread through the reference panel. These VCF files must have the same number of sites and the i-th site in each VCF should refer to the same site in the genome for all i. 

Given these VCF files, the program outputs for each query haplotype, the size of the Minimal Positional Substring Cover of the query haplotype by the reference panel, the size of the set maximal match only MPSC solution space, the length maximal MPSC length, and a Length Maximal MPSC of the query by the reference panel. It also outputs the time taken to obtain and output the above information (given a PBWT of the reference panel). The algorithms implemented here are presented in DOI:.

Compile with std=c++11 or higher. The following command may be used:
g++ -O3 -std=c++11 -o MPSC.exe MPSC.cpp


Sample Usage:

After compilation, you can run the program on the sample vcfs using the following command:
./MPSC.exe -r example.vcf -q query_example.vcf

options:
-r, input VCF file for the reference panel, default is panel.vcf
-q, input VCF file for the panel of query haplotypes to search against the reference panel, default is query.vcf
-o, output file for statistics, default is output.txt
-h, help. output this screen
