Figure 3.

Detailed model and estimation framework for one line at one site. Frequencies of A, C, G, and T in the population govern the distribution of parental genotypes in generation 0 (AA × AC in the figure). The distribution of parental genotypes at generation 20, conditional on the genotypes at generation 0, is specified by the Markov chain described in Methods. It is assumed that this cross produces many offspring in the absence of segregation distortion so that the nucleotide frequencies among the offspring match those of the parents. The sequencing reads, which are the observed data, are composed of a random sample of these nucleotides (with replacement) along with errors whose frequencies are given by ɛ. The unobserved, or missing, data are composed both of the parental genotypes at generation 20 (here 966inf38 for AA × AA) (cf. Table 1) and indicators (966inf39) that record the error status of each read. For example, 966inf40 in the figure because the T is an error, which is clear when the “genotype” 966inf41 is known, provided mutation is precluded. While each line has its own observed and missing data, the global parameters 966inf42 and ɛ are common to all lines.

966fig3