Markup | Genome Research

Table 2.

Comparison of conservation of cis-regulatory elements (CREs) to two types of control sites

	Group 1 vs. group 2	CRE vs. nearby	CRE vs. random intergenic	Nearby vs. random intergenic
Per site analysis	Group 1 mean per site % identity	51.3%	51.3%	47.8%
	Group 2 mean per site % identity	47.8%	42.9%	42.9%
	Difference of means (group 1 – group 2)	3.6%	8.4%	4.9%
	Difference of means resampling p-value	0.05	0.003	1E-5
	Distribution comparison KS p-value	0.026	0.0016	2E-6
Per base analysis	Group 1 mean per base % identity	47.8%	47.8%	46.3%
	Group 2 mean per base % identity	46.3%	42.4%	42.4%
	Difference of means (group 1 – group 2)	1.5%	5.4%	3.9%
	Difference of means resampling p-value	0.24	0.05	5.8E-4

[i] For each CRE 20 RICs were generated by randomly choosing sites of the same length as the CRE, on the same chromosome and strand, and rejecting any that overlapped a known gene. Then 10 nearby control sites were generated for each CRE by adding positive and negative (i.e., 3′ and 5′) offsets of 50, 100, 150, 200, and 250 bp to the coordinates of each true CRE. Percentage identities for all CRE and control sites were computed relative to reference alignment, on both a per site and per base basis. Unaligned bases, mismatchs, and D. melanogaster insertions contributed zeros to % identity results; D. pseudoobscura insertions were ignored. The distributions of % identity values were clearly not normal, thus we avoided using tests such as the t-test that assume normality. We compared the per site and per base mean % identities of each group using a resampling test, in which the p-value of the observed difference was estimated as the frequency (over a million trials) in which a value as large or larger than the observed CRE mean was observed in an equal-sized sample of control sites. Similarly, the p-value of the difference between the two control sets was estimated using a randomization test (over a million trials) in which the sets mixed and then repartitioned into corresponding mock control sets. We compared the distributions using the Kolmogorov-Smirnov test, which measures the likelihood that samples came from the same continuous distribution.