DESCRIPTION :
  These scripts perform two alignment-based and one k-mer based analysis of sequences given phenotypic data.
  They are supposed to identify regions, patterns, or even individual positions in sequences that are most associated with the phenotype.


DEPENDENCIES: 
  python3.5
  Rscript
  plink19
    -- can also be called using the command "plink", but we assume it is version 1.9
  mafft


OPTIONAL DEPENDENCIES:
  cons (optional, but useful for kmer methods)
    -- consensus sequence calculator included in EMBOSS, only required if you choose to use /kmer_methods/consensus.py


BASIC USAGE (two main scripts that will call the others as needed) :
  ./runAll.sh seqFile phenFile alnSeqFile
    -- Performs basic 2 aln methods and 1 kmer method analyses.
       You can either set variables in this file, or give them as command line arguments.
       Notes within runAll.sh tell you what to comment out if you only want to do one type of analysis.
  ./randomPhenotypes/runRandomPhenotypeTests.sh
    -- Set variables in this file only, and you are ready to go.


NECESSARY INPUT FILES : (see /test for examples)
  seqFile
    -- fasta file (unaligned) with all nucleotide sequences used in the analysis.
  alnSeqFile
    -- aligned seqFile with consensus sequence named "> consensus"
  phenFile
    -- tab delimited file with header.
       Column 1 is sequence name (same as headers in fasta file), columns 2+ are phenotype names.
       Rows have 1 for phenotype present, 0 for phenotype absent.
    

DIRECTORY STRUCTURE:
  The /randomPhenotypes directory contains scripts for running random phenotype tests (for statistical purposes).
  The /test directory contains a fake dataset that you can use to test the scripts, to get a feel for what they output and how they work, as well as expected input formatting.
  kmer_methods and aln_methods have scripts. The only ones you'll likely want to run will be "runAlnMethods.sh" or "runKmerMethods.sh" which call the other scripts appropriately.

