prfreq.c - Program that uses Poisson random fields to calculate expected allele frequency distributions under various possible distributions of selective effects and demographic models.  Maximum likelihood estimates for selection and demographic parameters calculated using exhaustive grid search; hill-climbing gradient search methods are currently not supported although they are outlined in the README for there inclusing in a future release.  Demography assumes an ancestral population size shared with the outgroup, along with 0 to 2 instantaneous size changes in the ingroup.  Each demographic epoch is characterized by its length (TAU, scaled by Ncurr) and effective pop size (OMEGA, Nepoch/Ncurr).

Run modes: 
mode 1: Can take as input an allele frequency distribution, a demographic model, a distribution of selective effects, and the possible range that each parameter in the demographic model and selective distribution can take, and return the maximum likelihood estimate of each selective distribution parameter (read_in_snps = 1, run_type = 1 iff exhaustive grid search, run_type = 2 iff gradient search)

 2: Can take as input a demographic models and a distribution of selective effects (with parameters), and return the expected allele frequency spectrum (read_in_snps = 0, run_type = 3, num_iters = 0; also works if read_in_snps = 0 and run_type = 1 or 2)

 3: Can take as input a demographic models and a distribution of selective effects (with parameters), and simulate allele frequency spectra to infer the parameters of the selective distribution and their 95% confidence intervals (read_in_snps = 0, run_type = 3, num_iters >= 1)

 4: Can take as input an expected allele frequency spectrum, a demographic model, and a distribution of selective effects, and simulate num_iters allele frequency spectra (based on the expected allele frequency spectrum and THETA) to infer the parameters of the selective distribution and their 95% confidence interval (read_in_snps = 1, run_type = 3, num_iters >= 1)

 5: Robustness to violations of demographic or selective assumptions can be gauged by generating expected allele frequency spectra under certain demographic and selective regimes (mode 2) and then running prfreq in mode 4 with alternative demographic and/or selective regimes (output file becomes snp file, but delete header line)

 6: Can take as input a demographic model and parameters mu and sigma and return the mkprf table with num_iters genes where gamma for each gene is chosen from normal(mu, sigma)  (distrib = 4, read_in_snps = 0, run_type = 4)

 7: Can take as input a demographic model and parameters mu and sigma and return the mkprfint table with num_iters genes where mean gamma for each gene is chosen from normal(mu, sigma) and log(stdev) for each gene is chosen from N(0,1)  (distrib = 4, read_in_snps = 0, run_type = 5)


 modes are controlled in settings file:
fixed = 0, 1, 2, 3 (0 = polymorphisms only; 1 = include all fixed diffs, 2 = include fixed diffs arising in ingroup only; 3 = folded sfs polymorphisms only)
poisson = 0 or 1 (0 = multinomial likelihoods; 1 = poisson likelihoods)
samplesize = XX  (XX = number of chromosomes in sample, i.e. lines in snpfile)
Ne = 2000        (current effective pop size, cannot be greater than 10000)
THETA = XX       (scaled mutation rate)
tdiv = XX        (scaled divergence time, only used if fixed = 1)
num_epochs = 1, 2, 3, 4, 5 (1 = stationary pop, 2 = contraction/expansion,
                        3 = bottleneck, 4 = complex bottleneck, 5 = scale by Nanc)
TAU = min max num   (scaled time since non-stationary pop dynamics)
OMEGA = min max num (ratio of ancestral Ne to current Ne)
// NOTE: do not include "TAU =" or "OMEGA =" lines if num_epochs = 1
TAU_B = min max num (scaled time of bottleneck)
OMEGA_B = min max num (ratio of bottleneck Ne to current Ne)
// NOTE: do not include "TAU_B" or "OMEGA_B" lines unless num_epochs = 3
distrib = XX     (type of distribution of selective effects (see below)
P min max num    (in dimension 1, parameter may take values between min and max;
                  in grid search, search num number of steps; in gradient
                  search, start search at value num)
// NOTE: include one "P" line for each parameter in distribution (see below)
I type min max   (integrate from min to max using midpnt iff type=1, 
                  midinf iff type=2, midexp iff type=3 or midexpN iff type=4)
// NOTE: include one "I" line for each part of the integration (up to 6)
read_in_snps = 0 or 1 (see above under "Modes")
// NOTE: if read_in_snps = 1, the snp filename must be included in the
//  command line; otherwise, it must not be included
run_type = 1, 2, 3, 4 or 5 (see above under "Modes")
input_grids = 0 or 1 (use precomputed allele freq spectra)
// NOTE: if input_grids = 1, the input grid filename must be included in the
//  command line; otherwise, it must not be included
output_grids = 0 or 1 (output computed allele freq spectra)
// NOTE: if output_grids = 1, the output grid filename must be included in the
//  command line; otherwise, it must not be included
num_iters = XX   (number of iterations for power analysis, line is ignored if
                  run_type != 3)
   
Distribution types:
distrib =
 0 : neutral distribution (# dimensions = 0)
 1 : single point mass (#dim = 1: gamma)
 2 : 2-pt (#dim = 2: pneutral, gamma)
 3 : 3-pt (#dim = 4: pneutral, pneg, gamma_neg, gamma_pos)
 4 : normal (#dim = 2: mean, stdev)
 5 : weighted norm (#dim = 3: mean, stdev, weight)
 6 : negative gamma (#dim = 2: alpha, beta)
 7 : reflected gamma (#dim = 3: alpha, beta, ppos)
 8 : neg gamma shifted (#dim = 3: alpha, beta, max_gamma)
 9 : neg lognormal (#dim = 2: scale, shape)
10 : neg lognormal shifted (#dim = 3: scale, shape, loc)
11 : 2-normal (#dim = 5: mu1, mu2, sd1, sd2, pneg)
12 : negative exponential (#dim = 1: lambda)
13 : 3-pt + pos selection (#dim = 4: p-500, p-50, p-5, maxgam)
14 : weighted norm2 (#dim = 4: mean, stdev, weight, gam)
15 : 4-pt (#dim = 4: p-500, p-50, p-5, p0) (not implemented yet)
16 : lethal/neutral (#dim = 1: pneu)
17 : lethal/const/neutral (#dim = 3: pneu, gam, pgam)


Compilation: gcc prfreq_v0610.c -lm -O3 -o exec_fname


To run: exec_fname settings_filename output_filename [input_snps_filename input_grid_filename output_grid_filename]

