These scripts were written by John Edwards (jedwards@dom.wustl.edu). They were used for data processing for Edwards et. al. (2010) Chromatin and Sequence Features that Define the Fine and Gross Structure of Genomic Methylation Patterns. Genome Research. 

Further information and the latest versions can be found at http://epigenomics.wustl.edu.


GENERAL INSTRUCTIONS
Processing instructions for each individual script can be found below.  Scripts assume data was first mapped to a reference genome and start with the .mates file produced by the AB mate-pair mapping tool.  See http://solidsoftwaretools.com for more information concerning the SOLiD System Color Space Mapping Tool.

Other requirements are:

(1) An installation of the nibFrag tool available as part of the blat suite from Jim Kent at UCSC

(2).nib formatted files for each chromosome

(3) cmap file used in for the AB mapping software 

(4) CpG position file.  Tab delimited file where first column is the chromosome (chr1, etc.) and the second column is the coordinate of the CpG on the chromosome.

While the filtering and CpG overlap steps (Steps 3-5) can be performed with the scripts it is recommended that each chromosome be processed separately and jobs split across multiple CPUs or across a compute cluster. 



STEP 1: Parse Mates Files (run for both McrBC and RE .mates files)

USAGE: parseMates.pl <cmapFile> <inFile> <outDir> <libDescriptor[M,R]>

Converts .mates file from AB mate-pair mapping software and creates bed files (one per chromosome).

<cmapFile> = cmap file used for tag mapping
<inFile> = .mates file from tag mapping
<outDir> = location for bedFiles
<libDescriptor> = M for McrBC, R for RE



STEP 2: Filter McrBC Fragments for fragments with at least one valid McrBC recognition site 

USAGE:  filterMcrBC.pl <bedFile> <outFile>

Bed files from McrBC data from step 1 can be concatenated or each chromosome file processed separately.

<bedFile> = bedFile or concatenated bedFile from Step 1
<outFile> = filtered output file



STEP 3: Filter RE Fragments

USAGE: filterRE.pl <bedFile> <outFile>

Bed files from RE data from step 1 can be concatenated or each chromosome file processed separately.

<bedFile> = bedFile or concatenated bedFile from Step 1
<outFile> = filtered output file



STEP 4: Normalize McrBC and RE fragments
This step was never fully automated.  See description in paper. To find the correct ratio of McrBC and RE fragments, different ratios were selected for several chromosomes and the scripts from Steps 5 and 6 were used to compute the overall coverage values.



STEP 5:  Overlap fragments with CpG sites

USAGE: overlapCpGsites.pl <CpGfile> <McrBCfile> <REfile> <outFile>

Overlaps McrBC and RE fragments with genomic CpG positions.  Output is a text file which contains the number of McrBC and RE fragments overlapping each CpG site.

<CpGfile> = CpG position file.  First column is chromosome, second is coordinate
<McrBCfile> = Filtered McrBC fragment bed file from step 2.
<REfile> = Filtered RE fragment bed file from step 3.
<outfile> = output file

output file format:
col1-	chromosome
col2-	CpG position
col3-	Number of McrBC fragments where the CpG is well interior (and thus was protected from digestion and unmethylated)
col4-	Number of McrBC fragments overlapping where this CpG was just inside the fragment (probable McrBC recognition site). Distance inside is set as parameter in script.  Default is 50 bp.
col5-	Similar to col4, but CpGs just outside the end of the McrBC fragment
col6-	Number of RE fragments where CpG is interior (and thus was protected from digestion and methylated)
col7-	Number of RE fragments where this CpG sits at the end of the fragment and was part of the RE cleavage site (and thus unmethylated)



STEP 6: Calculate methylation scores

Use data from step 5 to calculate a methylation score at every CpG site.  Note: a score is computed at every CpG site.  Scores at non-McrBC/non-RE sites are effectively estimated scores.  It is recommended that scores only be used at McrBC and RE sites and caution be used when using scores at other sites.

USAGE: calculateMethylationScores.pl <cpgFile> <outFile>

<cpgFile> = output file from step 5
<outFile> = output file

outFile format
col1-	chromosome
col2-	CpG position
col3-	Methylation score. 0 = fully unmethylated, 1 = fully methylated
col4-	Unmethylation score.  1 – col3.
col5-	coverage, Number of McrBC and RE fragments covering CpG site

