Perry Evans; Chao Wu; Amanda Lindy; Dianalee A. McKnight; Matthew Lebo; Mahdi Sarmady; Ahmad N. Abou Tayoun

Figure 3.

Disease-specific classifier performance using disease panel cross-validation. For each disease panel, we used a hold-one-gene-out approach to evaluate a logistic regression model's ability to predict pathogenicity. For all genes in a disease panel, we trained PathoPredictor using variants from all other genes and tested the model using variants from the gene of interest. Using the held-out gene variant prediction scores, we computed a precision-recall curve (A) and summarized the curve as the average precision (B). We then computed a precision-recall curve for each individual feature using untransformed scores. The numbers of pathogenic (p) and benign (b) variants investigated are shown at the bottom left of each panel in B. For all epilepsy variants, PathoPredictor performed significantly better than any single feature (P < 10⁻⁴), and PathoPredictor only failed to be significantly better in six of the 24 total feature comparisons (CCR, VEST, and missense depletion for RASopathies, CCR for dominant epilepsy genes, and missense depletion and MTR for cardiomyopathy).

Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets

This Article

Preprint Server

Current Issue

In This Issue