Robust chromatin state annotation

(Downloading may take up to 30 seconds. If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.

Figure 5.
Figure 5.

SAGAconf identifies confident state annotations. SAGAconf integrates sources of reproducibility information such as granularity, alignment, and posterior to derive an r-value for each genomic position. The r-value estimates the probability of reproducing the annotation at a specific genomic position. (A) Density histogram of r-values for each chromatin state in the running example (S1, ChromHMM, GM12878). A threshold of α = 0.9 is applied to label positions with r ≥ α as reproducible, or they are irreproducible otherwise. The horizontal axis represents the r-value; the vertical axis represents density; the red dotted line represents the threshold; and the green-shaded area show confident annotations. (B) Bar plot representation of expression level (TPM) as a function of r-value for transcribed chromatin states in the running example. Each bar signifies the mean TPM for each r-value bin, with error bars indicating the standard deviation. (C) Histogram comparing the frequencies of log(TPM) values for confident (r ≥ 0.9) and nonconfident (r < 0.9) annotations of the transcribed chromatin state in the running example. The x-axis represents ranges of log(TPM) with each pair of bars corresponding to confident (green) and nonconfident (red) annotations, respectively. (D) Fraction of chromatin states called confident by SAGAconf for ChromHMM annotations according to S1 (different data, different models) in five cell types. Each dot represents a chromatin state, with color denoting cell type and size proportional to genome coverage. (E) Fraction of genome called confident across two SAGA models, five cell types, and three settings. Color denotes SAGA model, and shapes represent cell types. (F) We measured the fraction of the genome called as confident by SAGAconf during the process of postclustering; that is, we measured the fraction of confident positions in the genome as a function of the base annotation's entropy as we merged similar chromatin states in the base annotation. The horizontal axis represents the base annotation entropy; the vertical axis represents the fraction of genome identified as confident; subpanels correspond to different SAGA models and settings; and colors represent cell types.

This Article

  1. Genome Res. 34: 469-483

Preprint Server