Figure 2.

Training performance across successive iterations with bovine trios reveals insights into DV's behavior. The darker vertical lines highlight the checkpoints (2, 12, 18, 22, 28, 30) used in subsequent analyses to compare across phases. TrioTrain enables structuring the bovine inputs to match inheritance expectations; variation is transmitted from parents to offspring. Pedigree data are not explicitly given to DV; instead, parental genotypes are used to inform predictions in the offspring. The number of checkpoints (x-axis) represents the number of parental genomes given to DV (odd, paternal; even, maternal). Two iterations are required to cover both parents. For example, iterations 1 and 2 represent a complete bovine trio in which truth labels from a single offspring were used to evaluate the optimal training stopping point (Table 1). In the first two panels, the y-axis represents the maximum F1 score achieved in the offspring; comparing across iterations reveals if DV found the parental truth labels informative. Each line represents stratified performance by different variant classifications. (A) Variant type (SNVs, indels). As expected, training struggles with indels owing to imbalanced classification, as the bovine truth labels contain more SNV relative to those from humans. (B) Genotype class (HomAlts, Hets). Between iterations 18 and 19, we observe a shift in the best-performing genotype class for the offspring owing to distributional changes to heterozygosity. (C) Truth label contents contribute to training performance. Phase 3 uses truth labels from bison genomes aligned to the cattle reference; a sudden increase in HomAlts (Supplemental Fig. S3) is reflected by the decreased Het:HomAlt ratio. Counterintuitively, increasing the number of HomAlt examples given to DV contributes to a decrease in F1 score for that genotype class during training.

1859f02