This text file tries to explain the output of assocToSummarySeq_ADedit1.py, which differentiates between case-enriched and case-depleted nucleotides after PLINK analysis.
The output names are unintuitive, so please read the text thoroughly to avoid confusion.

cases_enriched output means:
When the nucleotide is mutated/changed to something else, the mutation at that position with the shown starting nucleotide is associated with the cases.
Example: The 'A' nucleotide at position 5 is significant => Mutation of 'A' to another nucleotide (C, G, or T) is significantly associated with cases.

cases_depleted output means:
When the nucleotide is mutated/changed to something else, the mutation at that position with the shown starting nucleotide is associated with the controls.
Example: The 'T' nucleotide at position 5 is significant => Mutation of 'T' to another nucleotide (A, C, or G) is significantly associated with controls.

What this means for combining with kmer enrichment analysis:
The kmer enrichment analysis produces kmers that are case-enriched or case-depleted, similar to the TWAS (nucleotide association) analysis done using PLINK. The meanings of case-enriched and case-depleted are different between the two analyses though.
For kmers, case-enriched means that the kmer is found significantly more often in cases compared to controls. On the other hand, case-depleted means that the kmer is found significantly less often in cases compared to controls. Put another way, case-depleted kmers are found more often in controls.
Therefore, case-enriched nucleotides from TWAS correspond to case-depleted kmers, while case-depleted nucleotides from TWAS correspond to case-enriched kmers.
Example: "atgactcagc" is a case-enriched 10mer
	 'ATGA-T--GC' is the sequence of case-depleted major allele nucleotides that are significant in TWAS (at the same positions relative to consensus)
	 The results from kmer and TWAS analyses correspond nicely and comparing the two makes sense.

Reason for the naming/description scheme:
Cases/controls may not always mean the same between the phenotypes being tested, so significant nucleotides and kmers are described as case-enriched or case-depleted.
More importantly, nucleotide associations are tested by comparing the nucleotide of interest vs. not the nucleotide (e.g. 'A' vs 'not-A', major allele vs. not-major allele). The end result is that the nucleotide being changed is known, but what the nucleotide turns into is unknown. This is the main reason for why the format is being kept this way.

