To reproduce the various models analyzed in the manuscript by rerunning 
rCLAMPS in its entirity, rCLAMPS can be rerun setting the globals in 
gibbsAlign_GLM.py in the following ways:

1.  For the full homeomodomain model (i.e., in Figure 2)

DOMAIN_TYPE = 'homeodomain'
OUTPUT_DIRECTORY = '../my_results/allHomeodomainProts/'
ORIGINAL_INPUT_FORMAT = True
RUN_GIBBS = True
EXCLUDE_TEST = False
MWID = 6
RAND_SEED = 382738375
MAXITER = 15
N_CHAINS = 100
INIT_ORACLE = False
SAMPLE = 100
OBS_GRPS = 'grpIDcore'

Note that as currently set up, this will run 100 gibbs sampling chains in parallel 
(using up to 1 - number of CPUs on your machine), which may take some time depending
on the number of CPUs/cores on your machine.  If you are not interested rerunning 
all of rCLAMPS, but just wish to make predicitons using the optimal model found
previously by rCLAMPS, you can simply refer to ./code/examplePredictions.py and the 
corresponding portion of README.rd.

NOTE: 
For the holdout validation testing, the program holdoutRetrainGLM.py was then run with:
CORE_POS = 'useStructInfo'
OBS_GRPS = 'grpIDcore'
APOS_CUT = 'cutAApos_1.0_0.05'
EDGE_CUT = 'edgeCut_1.0_0.05'
MAX_EDGES_PER_BASE = None
TRAIN_SET = 'cisbp'#'b08'
MWID = 6
RESCALE_PWMS = True
EXCLUDE_TEST = False
MAKE_LOGOS = False
ADD_RESIDUES = False
mainOutDir = '../results/cisbp-chuAll/structFixed1_grpHoldout_multinomial_ORACLEFalseChain100Iter15scaled50/' (in main())

All predicitons produced by this model and analysis are located in:
'./results/cisbp-chuAll/structFixed1_grpHoldout_multinomial_ORACLEFalseChain100Iter15scaled50/pwms_pred_holdOneOut.txt'

___________________________________________________________

2.  For the homeodomain model with the random test set held out
(i.e., in Figure 3, 4, Supplemental Figures comparing to rf_extant and rf_joint)

Same as above, except set:
EXCLUDE_TEST = False

All predicitons produced by this model and analysis are located in:
'../results/cisbp-chu/structFixed1_grpHoldout_multinomial_ORACLEFalseChain100Iter15scaled50/transfer_test/fracID_fullDBD/' directory files 'pccTable_test_predOnly.txt' and 'pccTable_test_transfer.txt'
that were produced by the program hd1_transfer_predictions.py.  

Note that hd1_transfer_predictions.py has not been updated since changing the API for gibbsAlign_GLM.py to allow for tandem multidomain interfaces (like C2H2-ZFs).  The logic of the code remains the same, but the API needs to be updated in some places.

___________________________________________________________

3.  For the C2H2-ZF model tested in holdout validation (Supplemental Figure)

DOMAIN_TYPE = 'zf-C2H2'
OUTPUT_DIRECTORY = '../my_results/zf-C2H2_250_50_seedFFSdiverse6/'
ORIGINAL_INPUT_FORMAT = False
RUN_GIBBS = True
EXCLUDE_TEST = False
MWID = 4
RAND_SEED = 382738375
MAXITER = 50
N_CHAINS = 250
INIT_ORACLE = False
SAMPLE = 100
OBS_GRPS = 'grpIDcore'

NOTE: 
For the holdout validation testting, the program holdoutRetrainGLM_zfC2H2.py was then run with:
MWID = 4
RIGHT_OLAP = 1
MAKE_LOGOS = True
OBS_GRPS = 'grpIDcore'
ANCHOR_B1H = False
PROT_SEQ_FILE = '../precomputedInputs/zf-C2H2/prot_seq_fewZFs_hmmerOut_clusteredOnly_removeTooShort.txt'  PROT_SEQ_FILE_FFS = '../flyFactorSurvey/enuameh/enuameh_perFinger_processedProtInfo.txt'
PWM_INPUT_TABLE = '../precomputedInputs/zf-C2H2/pwmTab_fewZFs_clusteredOnly_removeTooShort.txt'
PWM_INPUT_FILE_FFS = '../flyFactorSurvey/enuameh/flyfactor_dataset_A.txt'
SEED_FILE = '../flyFactorSurvey/enuameh/enuameh_startPosInfo.txt'
mainOutDir = '../my_results/zf-C2H2_250_50_seedFFSdiverse6/' (in main())

All predicitons produced by this model and analysis are located in:
'../my_results/zf-C2H2_250_50_seedFFSdiverse6/pwms_pred_holdOneOut.txt'
