Markup | Genome Research

Figure 7.

Ambiguity and errors in inferring segment copy number (SCN) profiles for a heterogeneous sample S = (G₁, G₂) under different assumptions about the sample composition. (A) A two-genome proper sample S = (G₁, G₂): each genome G_i ∈ S is depicted as collections of adjacent blocks (top), and the corresponding sequences of signed blocks (bottom). (B) The copy number profile c = [c₁, c₂, c₃, c₄] inferred under the assumption that the sample is homogeneous (i.e., comprised of a single derived genome) and the reference genome is haploid (i.e., each segment has only a single haplotype in the reference). Each value c_j is the weighted average of the sums of haplotype-specific (or allele-specific) copy numbers $a_{i, j} + b_{i, j} = {\hat{c}}_{i, j} + č_{i, j}$ over the genomes G_i ∈ S. (C) Allele-specific copy number profiles $\hat{c} = [{\hat{c}}_{1}, {\hat{c}}_{2}, {\hat{c}}_{3}, {\hat{c}}_{4}]$ and $č = [č_{1}, č_{2}, č_{3}, č_{4}]$ inferred under the assumption that the sample is homogeneous and the reference genome is diploid (i.e., each segment has two haplotypes labeled A and B). Here, the entries ${\hat{c}}_{j}$ and $č_{j}$ for segment j are averages $({\hat{c}}_{1, j} + {\hat{c}}_{2, j}) / 2$ and $(č_{1, j} + č_{2, j}) / 2$ of genome- and allele-specific copy number values. Note that the vectors $\hat{c}$ and č do not preserve the true A/B label of each allele: dark blue are true counts of allele A and light blue are true counts of allele B. Here, segments 2 and 4 are flipped. (D) Genome-specific copy number profiles c₁ = [c_1,1, c_1,2, c_1,3, c_1,4] and c₂ = [c_2,1, c_2,2, c_2,3, c_2,4] inferred under the assumption that the sample is heterogeneous, but the reference genome is haploid. Here, the entry c_i,j for a segment j and genome G_i is the sum ${\hat{c}}_{i, j} + č_{i, j}$ of allele-specific copy number values in a genome G_i. (E) Allele- and genome-specific copy number matrices $\tilde{C} = (\hat{C} = {[{\hat{c}}_{1}, {\hat{c}}_{2}, \dots, {\hat{c}}_{n}]}^{T}, Č = {[č_{1}, č_{2}, \dots, č_{n}]}^{T})$ inferred under the assumption that the sample is heterogeneous and the reference genome is diploid. Segments 2 and 4 are flipped alleles: $(č_{1, 2}, {\hat{c}}_{2, 2}) = (a_{1, 2}, b_{2, 2})$ and $(č_{1, 4}, {\hat{c}}_{2, 4}) = (a_{1, 4}, b_{2, 4})$ .