
Comparison of RefSeqFEs to other gene regulatory data sets. (A) Overview showing data derivation, feature type representation, and current sizes of each data set on the human GRCh38.p13 and mouse GRCm39 reference assemblies. Additional information for each data set is provided in Supplemental Table S5A. (B) Bar graph showing human AR 109.20201120 RefSeqFE feature intersections with the indicated data sets, for which the y-axis represents the percent of input RefSeqFE features showing overlap. All features in comparative data sets were intersected with either all RefSeqFE features (medium blue bars), RefSeqFE regulatory features (gray bars), or RefSeqFE enhancer features (light blue bars). Enhancer features from each data set were additionally intersected with RefSeqFE enhancer features (dark blue bars). Full statistics including input and overlapping feature counts, overlap percentages with respect to each data set, Fisher P-values, and Jaccard statistics are provided in Supplemental Table S5B, with raw intersection output, feature lengths, and degrees of overlap with respect to each data set in Supplemental Table S5D. Data sets showing overlap with each RefSeqFE feature are also indicated in Supplemental Table S3B, column G. (C) Box plot showing feature length distributions for the indicated human data sets. Some outliers and dbSUPER feature lengths (maximum 498572) are not displayed because the y-axis was scaled to better visualize shorter feature distributions; see Supplemental Figure S6A for a 50-kb y-axis scale with dbSUPER data included: n = 9862, 926,535, 622,457, 63,285, and 1989 sample points. Additional statistics including minimums, maximums, averages, and standard deviations from the mean are provided in Supplemental Table S5A. (D) Bar graph showing mouse AR 109 RefSeqFE feature intersections with the indicated data sets, as described for human in B. Supporting details are provided in Supplemental Tables S3C and S5C,E. (E) Box plot showing feature length distributions for the indicated mouse data sets, as described for human in C: n = 2271, 343,747, 364,670, 49,802, and 1291 sample points. Supporting details are provided in Supplemental Table S5A and Supplemental Figure S6.











