Figure 3.

Regression results for GC content across variant subtypes for rare variants, common variants, and substitutions. The relationship between local GC content and the observed conditional variant proportion for seven variant subtypes: (A) AT > GC, (B) AT > CG, (C) AT > TA, (D) CpG GC > AT, (E) GC > AT, (F) GC > TA, and (G) GC > CG. Filled points show the conditional variant proportions in each GC content bin, scaled by the intercept of the logistic regression1974inf1, where 1974inf2 is the intercept calculated in the regression, 1974inf3 is the count of the given 1974inf4 variant subtype, and 1974inf5 is the number of 1974inf6 ancestral invariant sites that could produce the given subtype in the 1974inf7 th GC content bin. Symbol size represents the proportion of the given variant subtype falling into a given GC-content bin. The solid lines show the fitted logistic regression curve, where 1974inf8 is the slope fitted in the logistic regression and 1974inf9 is the GC content in the 1974inf10 th bin. The gray dashed line represents the baseline of no effect, 1974inf11. Legends in each subplot show the regression slope calculated for each variant class and its significance. (***) P-value < 0.0001, (**) P-value < 0.001, (*) P-value < 0.01.

1974fig3