Brief description of workflow used to analyze WGS data in Wall JD, 
Tang LF et al. (2014; Genome Research).

Given input vcf files NS9.vcf, NS10.vcf, NS11.vcf and NS12.vcf, run the file
in1a to create the file AllIND_1KG.geno.  As an example, the first 100 lines 
of AllIND_1_1KG.geno are given in the file AllIND_1_1KG.geno.head.  

AllIND_1KG.geno is a shortened summary of the polymorphism data, where each 
line consists of the following fields:

chr pos IND1a IND1b IND2a IND2b IND3a IND3b IND4a IND4b GQIND1 GQIND2 GQIND3 GQIND4 1KGfreq

chr = chromosome number
pos = chromosome position
IND1a & IND1b = Genotype of first individual
GQIND1 = Genotype quality score of first individual (-1 = not available)
1KGfreq = Frequency of the CEU sample from the 1000 Genomes Phase 1 data.

The 1000 Genomes frequencies were downloaded from the web.  An example of the
input file used is CEUfreq22.

Simple unix command lines were used to analyze the resultant file.

Please address any questions to Jeff Wall <wallj@humgen.ucsf.edu>
