---
title: "Refernece guide for DamID-seq Analysis"
author: "James M. Ashmore"
email: "s1437643@sms.ed.ac.uk"
date: "June 25, 2017"
output:
  html_document:
    toc: true
    toc_depth: 2
---



# Document summary

This R Markdown document serves as a reference guide to call peaks from DamID-seq sequencing data. In order to use this document for your own analysis you will need to start with a RangedSummarizedExperiment object. The assay slot in this object should contain your read count matrix. The rows of this matix represent each GATC fragment within the genome. The columns represent the read counts for each sample. The colData slot in this object should contain a data frame with your sample information. One of the columns within this data frame should list which samples are the Dam-POI and Dam-only samples.

# Setup environment

Load required packages:


```r
pkgs <- c("csaw", "DESeq2", "edgeR", "genefilter", "limma", "qsmooth", "rtracklayer", "SummarizedExperiment")
libs <- lapply(pkgs, library, character.only = TRUE)
```

```
## Error in FUN(X[[i]], ...): there is no package called 'csaw'
```

Load RangedSummarizedExperiment object (User-input required):


```r
rse <- readRDS("ENTER YOUR RSE NAME HERE")
```

```
## Warning in gzfile(file, "rb"): cannot open compressed file 'ENTER YOUR RSE
## NAME HERE', probable reason 'No such file or directory'
```

```
## Error in gzfile(file, "rb"): cannot open the connection
```

Enter column name listing Dam-POI and Dam-only samples (User-input required):


```r
covariate <- "ENTER YOUR COLUMN NAME HERE"
```

Calculate median GATC fragment size:


```r
fsize <- median(width(rse))
```

# Data pre-processing

Filter low abundance fragments by count size (here we use a default of 10 counts):


```r
cutoff <- 10
abundances <- aveLogCPM(assay(rse))
```

```
## Error in aveLogCPM(assay(rse)): could not find function "aveLogCPM"
```

```r
keep <- abundances > aveLogCPM(cutoff, lib.size = mean(colSums(assay(rse))))
```

```
## Error in eval(expr, envir, enclos): object 'abundances' not found
```

```r
rse <- rse[keep, ]
```

```
## Error in rse[keep, ]: object 'keep' not found
```

Calculate scaling factors:


```r
dge <- asDGEList(rse, lib.sizes = colSums(assay(rse)), samples = colData(rse))
```

```
## Error in asDGEList(rse, lib.sizes = colSums(assay(rse)), samples = colData(rse)): could not find function "asDGEList"
```

```r
normfacs <- normOffsets(rse, lib.sizes = dge$samples$lib.size, type = "scaling")
```

```
## Error in normOffsets(rse, lib.sizes = dge$samples$lib.size, type = "scaling"): could not find function "normOffsets"
```

```r
dge$samples$norm.factors <- normfacs
```

```
## Error in eval(expr, envir, enclos): object 'normfacs' not found
```

Perform smooth quantile normalization on logCPM values calculated with scaling factors:


```r
qsd <- qsmoothData(qsmooth(object = cpm(dge, log = TRUE, prior.count = 1), groupFactor = rse[[covariate]]))
```

```
## Error in qsmoothData(qsmooth(object = cpm(dge, log = TRUE, prior.count = 1), : could not find function "qsmoothData"
```

# Differential binding analysis

Test for differential binding between Dam-POI and Dam-only samples:


```r
design <- model.matrix(~ covariate)
```

```
## Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
```

```r
fit <- lmFit(qsd, design)
```

```
## Error in lmFit(qsd, design): could not find function "lmFit"
```

```r
fit <- eBayes(fit, trend = TRUE, robust = TRUE)
```

```
## Error in eBayes(fit, trend = TRUE, robust = TRUE): could not find function "eBayes"
```

```r
all <- topTable(fit, coef = ncol(design), number = Inf, sort.by = "none")
```

```
## Error in topTable(fit, coef = ncol(design), number = Inf, sort.by = "none"): could not find function "topTable"
```

Write test statistics (calculated using limma) for each fragment to file:


```r
limmaResults <- data.frame(
    chrom = seqnames(rse),
    chromStart = start(rse) - 1,
    chromEnd = end(rse),
    name = ".",
    score = 0,
    strand = ".",
    logFC = all$logFC,
    AveExpr = all$AveExpr,
    t = all$t,
    P.Value = all$P.Value,
    adj.P.Val = all$adj.P.Val,
    B = all$B
)
```

```
## Error in all$logFC: object of type 'builtin' is not subsettable
```

```r
write.csv(limmaResults, file = "limmaResults.csv", quote = FALSE, row.names = FALSE)
```

```
## Error in is.data.frame(x): object 'limmaResults' not found
```

Merge fragments into putative peak regions:


```r
merged <- mergeWindows(rowRanges(rse), tol = fsize, max.width = 10000)
```

```
## Error in mergeWindows(rowRanges(rse), tol = fsize, max.width = 10000): could not find function "mergeWindows"
```

```r
results <- data.frame(logFC = all$logFC, AveExpr = all$AveExpr, PValue = all$P.Value)
```

```
## Error in all$logFC: object of type 'builtin' is not subsettable
```

```r
tabcom <- combineTests(merged$id, results)
```

```
## Error in combineTests(merged$id, results): could not find function "combineTests"
```

```r
tabbest <- getBestTest(merged$id, results)
```

```
## Error in getBestTest(merged$id, results): could not find function "getBestTest"
```

Write putative peak regions (including peak statistics) to file:


```r
mergedResults <- data.frame(
    chrom = seqnames(merged$region),
    chromStart = start(merged$region) - 1,
    chromEnd = end(merged$region),
    name = paste0("DamID_Peak_", 1:length(merged$region)),
    score = 0,
    strand = ".",
    signalValue = tabbest$AveExpr,
    logFC = tabbest$logFC,
    pValue = -log10(tabcom$PValue),
    qValue = -log10(tabcom$FDR)
)
```

```
## Error in seqnames(merged$region): object 'merged' not found
```

```r
write.csv(mergedResults, file = "mergedResults.csv", quote = FALSE, row.names = FALSE)
```

```
## Error in is.data.frame(x): object 'mergedResults' not found
```

Set qValue (qValue < N) and logFC (logFC > N) peak cutoffs:


```r
qValue <- 0.1
logFC <- 0.5
```

Write peaks passing qValue and logFC cutoffs to file:


```r
cutoff <- mergedResults$qValue > -log10(qValue) & mergedResults$logFC > logFC
```

```
## Error in eval(expr, envir, enclos): object 'mergedResults' not found
```

```r
peaks <- mergedResults[cutoff, ]
```

```
## Error in eval(expr, envir, enclos): object 'mergedResults' not found
```

```r
write.table(peaks, "signifPeaks.broadPeak", quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE)
```

```
## Error in is.data.frame(x): object 'peaks' not found
```

Write top differential fragment (used for motif analysis) within each peak region to file:


```r
ranges <- rowRanges(rse)[tabbest$best, ]
```

```
## Error in rowRanges(rse)[tabbest$best, ]: object 'tabbest' not found
```

```r
frags <- data.frame(
    chrom = seqnames(ranges),
    chromStart = start(ranges) - 1,
    chromEnd = end(ranges),
    name = paste0("DamID_Peak_", 1:length(merged$region)),
    score = 0,
    strand = ".",
    signalValue = tabbest$AveExpr,
    logFC = tabbest$logFC,
    pValue = -log10(tabcom$PValue),
    qValue = -log10(tabcom$FDR)
)
```

```
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'seqnames' for signature '"standardGeneric"'
```

```r
cutoff <- mergedResults$qValue > -log10(qValue) & mergedResults$logFC > logFC
```

```
## Error in eval(expr, envir, enclos): object 'mergedResults' not found
```

```r
frags <- frags[cutoff, ]
```

```
## Error in eval(expr, envir, enclos): object 'frags' not found
```

```r
write.table(frags, "signifFragments.broadPeak", quote = FALSE, sep = "\t", row.names = FALSE, col.names = FALSE)
```

```
## Error in is.data.frame(x): object 'frags' not found
```
