

This R package helps the user to identify very short sequences (e.g. di- or tri-nucleotides) present periodically in a set of genomic loci (typically regulatory elements). It is not aimed at identifying motifs separated by a conserved distance; for this type of analysis, please visit MEME website.
To begin, a genome sequence and a set of genomic loci must be defined. Let’s focus on the C. elegans genome for now, and more specifically around its TSSs.
Let’s look at the TT 10-bp periodicity strength around ubiquitous promoters:
MOTIF <- 'TT'
ubiq_TT <- getPeriodicity(
alignToTSS(ce_proms[ce_proms$which.tissues == 'Ubiq.'], 30, 500),
genome = ce_seq,
motif = MOTIF,
cores = 4
)
list_plots <- plotPeriodicityResults(ubiq_TT)
The other major use of this package is to generate specific tracks over a set of loci, e.g. the strength of WW 10-pb periodicity over promoters.
Important note: We recommand to run this command across at least a dozen of processors (use the PROCS argument). This command will take several hours and possibly up to a day to run. It typically takes one day to produce a periodicity track over 15,000 GRanges of 150 bp (with default parameters) using PROCS = 12. We highly recommand the user to run this command in a new screen session.
generatePeriodicityTrack(
ce_seq,
granges = ce_proms,
MOTIF = 'TT',
FREQ = 1/10,
PROCS = 12,
GENOME.WINDOW.SIZE = 100,
GENOME.WINDOW.SLIDING = 2, # can be 1 for single-base resolution
BIN.WINDOW.SIZE = 60, # Set BIN.WINDOW.SIZE == GENOME.WINDOW.SIZE for no sliding window
BIN.WINDOW.SLIDING = 5,
bw.file = 'TT-10-bp-periodicity_over-proms_gwin100_bwin60_bslide5.bw'
)Please read the Introduction vignette for a full presentation of the package functions.