Improved exome prioritization of disease genes through cross species phenotype comparison

Peter Robinson; Sebastian Köhler; Anika Oellrich; Sanger Mouse Genetics Project; Kai Wang; Chris Mungall; Suzanna E Lewis; Nicole Washington; Sebastian Bauer; Dominik Seelow Seelow; Peter Krawitz; Christian Gilissen; Melissa Haendel; Damian Smedley

doi:10.1101/gr.160325.113

Improved exome prioritization of disease genes through cross species phenotype comparison

¹ Charité University Hospital;
² Charité-Universitätsmedizin Berlin;
³ Wellcome Trust Sanger Institute;
⁴ -;
⁵ Zilkha Neurogenetic Institute, University of Southern California;
⁶ Lawrence Berkeley National Laboratory;
⁷ Radboud University Nijmegen Medical Centre;
⁸ Oregon Health & Sciences University

↵* Corresponding author; email: peter.robinson{at}charite.de

Abstract

Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic (non-coding, not affecting splicing, synonymous or missense mutations annotated as non-pathogenic by prediction algorithms). The observation that each of our genomes contains about 100 genuine loss of function variants with ~20 genes completely inactivated makes identification of the causative mutation problematic when using these strategies alone. In some cases it may be possible to use multiple affected individuals, linkage data, identity-by-descent inference, de novo heterozygous mutations from trio analysis, or prior knowledge of affected pathways to narrow down to the causative variant. In cases where this is not possible or has proven unsuccessful we propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modiﬁed mouse models with evaluation of the variants according to allele frequency, pathogenicity and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1 fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of over 95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

Received May 13, 2013.
Accepted October 24, 2013.

Published by Cold Spring Harbor Laboratory Press

This manuscript is Open Access.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 3.0 Unported), as described at http://creativecommons.org/licenses/by/3.0/.

Articles citing this article

A Phenotypic Paradigm for Cerebral Palsy Genetics medRxiv January 21, 2026 0: 2026.01.13.25341946v1-2026.01.13.25341946

The effects of biological knowledge graph topology on embedding-based link prediction bioRxiv October 3, 2025 0: 2024.06.10.598277v2-2024.06.10.598277

An inductive, supervised approach for predicting gene-disease associations using phenotype ontologies bioRxiv May 15, 2025 0: 2025.05.07.652682v1-2025.05.07.652682

Gene prioritisation for enhancing molecular diagnosis in rare skeletal muscle disease cohort J. Med. Genet. May 1, 2025 62: 350-357

Biallelic PAX7 variants cause a novel Satellite Cell-opathy with progressive muscle involvement resembling facioscapulohumeral muscular dystrophy medRxiv April 26, 2025 0: 2025.03.03.25322917v2-2025.03.03.25322917

Evaluating a Standard Benchmark for Gene Prioritization: The InheriNext(R) Algorithms Integration of Genomic and Phenotypic Information bioRxiv April 26, 2025 0: 2025.02.25.640147v2-2025.02.25.640147

An optimized variant prioritization process for rare disease diagnostics: recommendations for Exomiser and Genomiser medRxiv April 22, 2025 0: 2025.04.18.25326061v1-2025.04.18.25326061

Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework bioRxiv February 22, 2025 0: 2024.06.13.598672v2-2024.06.13.598672

Genetic Transformer: An Innovative Large Language Model Driven Approach for Rapid and Accurate Identification of Causative Variants in Rare Genetic Diseases medRxiv July 22, 2024 0: 2024.07.18.24310666v1-2024.07.18.24310666

Lessons and pitfalls of whole genome sequencing PN July 16, 2024 24: 263-274

A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery medRxiv May 30, 2024 0: 2024.05.29.24308104v1-2024.05.29.24308104

A de novo variant in PAK2 detected in an individual with Knobloch type 2 syndrome bioRxiv April 24, 2024 0: 2024.04.18.590108v1-2024.04.18.590108

Universal Exome Sequencing in Critically Ill Adults: A Diagnostic Yield of 25% and Race-Based Disparities in Access to Genetic Testing medRxiv April 3, 2024 0: 2024.03.11.24304088v2-2024.03.11.24304088

The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients medRxiv November 19, 2023 0: 2023.11.16.23298615v1-2023.11.16.23298615

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning bioRxiv November 15, 2023 0: 2023.11.08.566179v1-2023.11.08.566179

PhenoScore: AI-based phenomics to quantify rare disease and genetic variation medRxiv October 28, 2022 0: 2022.10.24.22281480v1-2022.10.24.22281480

Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs medRxiv July 22, 2022 0: 2022.05.24.22275398v2-2022.05.24.22275398

Biallelic truncating variants in ATP9A cause a novel neurodevelopmental disorder involving postnatal microcephaly and failure to thrive J. Med. Genet. July 1, 2022 59: 662-668

Comparison of missing data handling methods for variant pathogenicity predictors bioRxiv June 20, 2022 0: 2022.06.17.496578v1-2022.06.17.496578

Germline mosaicism of a missense variant in KCNC2 in a multiplex family with autism and epilepsy medRxiv December 8, 2021 0: 2021.12.06.21264306v1-2021.12.06.21264306

The GA4GH Phenopacket schema: A computable representation of clinical data for precision medicine medRxiv December 2, 2021 0: 2021.11.27.21266944v1-2021.11.27.21266944

Use of whole genome sequencing to determine genetic basis of suspected mitochondrial disorders: cohort study BMJ November 3, 2021 375: e066288

PSEA: A phenotypic similarity ensemble approach for prioritizes candidate genes to aid mendelian disease diagnosis bioRxiv October 18, 2021 0: 2021.10.13.464308v1-2021.10.13.464308

PhenoExam: an R package and Web application for the examination of phenotypes linked to genes and gene sets bioRxiv July 2, 2021 0: 2021.06.29.450324v1-2021.06.29.450324

Interpretable prioritization of splice variants in diagnostic next-generation sequencing bioRxiv January 30, 2021 0: 2021.01.28.428499v1-2021.01.28.428499

GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation bioRxiv December 9, 2020 0: 2020.12.04.408336v1-2020.12.04.408336

GeneBreaker - Variant simulation to improve the diagnosis of Mendelian rare genetic diseases bioRxiv November 17, 2020 0: 2020.05.29.124495v2-2020.05.29.124495

metPropagate: network-guided propagation of metabolomic information for prioritization of neurometabolic disease genes medRxiv November 2, 2020 0: 2020.01.12.20016691v1-2020.01.12.20016691

Interpretable Clinical Genomics with a Likelihood Ratio Paradigm medRxiv November 1, 2020 0: 2020.01.25.19014803v1-2020.01.25.19014803

COPB2 haploinsufficiency causes a coatopathy with osteoporosis and developmental delay bioRxiv September 17, 2020 0: 2020.09.14.297234v1-2020.09.14.297234

A domestic cat whole exome sequencing resource for trait discovery bioRxiv June 5, 2020 0: 2020.06.01.128405v1-2020.06.01.128405

AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature Sci Transl Med May 20, 2020 12: eaau9113

Characterizing sensitivity and coverage of clinical WGS as a diagnostic test for genetic disorders bioRxiv April 5, 2020 0: 2020.04.01.019570v1-2020.04.01.019570

Investigating the importance of anatomical homology for cross-species phenotype comparisons using semantic similarity bioRxiv December 16, 2019 0: 028449v1-28449

AMELIE 2 speeds up Mendelian diagnosis by matching patient phenotype & genotype to primary literature bioRxiv November 16, 2019 0: 839878v1-839878

Rapid and Accurate Interpretation of Clinical Exomes Using Phenoxome: a Computational Phenotype-driven Approach bioRxiv July 22, 2019 0: 275479v2-275479

Human and mouse essentiality screens as a resource for disease gene discovery bioRxiv June 26, 2019 0: 678250v1-678250

Models for infantile hypertrophic pyloric stenosis development in patients with esophageal atresia bioRxiv May 4, 2019 0: 625921v1-625921

Divine: Prioritizing Genes for Rare Mendelian Disease in Whole Exome Sequencing Data bioRxiv April 10, 2019 0: 396655v1-396655

ClinPhen extracts and prioritizes patient phenotypes directly from medical records to accelerate genetic disease diagnosis bioRxiv April 7, 2019 0: 362111v1-362111

An analysis and comparison of the statistical sensitivity of semantic similarity metrics bioRxiv April 4, 2019 0: 327833v1-327833

OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants bioRxiv April 4, 2019 0: 311654v1-311654

DeepPVP: phenotype-based prioritization of causative variants using deep learning bioRxiv April 3, 2019 0: 311621v1-311621

A semiautomated whole-exome sequencing workflow leads to increased diagnostic yield and identification of novel candidate variants Cold Spring Harb Mol Case Stud April 1, 2019 5: a003756

AMELIE accelerates Mendelian patient diagnosis directly from the primary literature bioRxiv March 23, 2019 0: 171322v1-171322

Indoril: An I-PV Add-On for Visualization of Point Mutations on 3D Cartesian Coordinates bioRxiv March 21, 2019 0: 148122v1-148122

Phenopolis: an open platform for harmonization and analysis of sequencing and phenotype data bioRxiv March 15, 2019 0: 084582v1-84582

The Monarch Initiative: An integrative data and analytic platform connecting phenotypes to genotypes across species bioRxiv March 14, 2019 0: 055756v1-55756

Phen-Gen: Combining Phenotype and Genotype to Analyze Rare Disorders bioRxiv February 10, 2019 0: 015727v1-15727

Advancing genomic approaches to the molecular diagnosis of mitochondrial disease Essays Biochem. July 20, 2018 62: 399-408

Genomic analysis of an infant with intractable diarrhea and dilated cardiomyopathy Cold Spring Harb Mol Case Stud November 1, 2017 3: a002055

A high-throughput sequencing test for diagnosing inherited bleeding, thrombotic, and platelet disorders Blood June 9, 2016 127: 2791-2803

A dominant gain-of-function mutation in universal tyrosine kinase SRC causes thrombocytopenia, myelofibrosis, bleeding, and bone pathologies Sci Transl Med March 2, 2016 8: 328ra30

Capturing phenotypes for precision medicine Cold Spring Harb Mol Case Stud October 1, 2015 1: a000372

Patient Mutation Directed shRNA Screen Uncovers Novel Bladder Tumor Growth Suppressors Mol Cancer Res September 1, 2015 13: 1306-1315

Bioinformatics for Clinical Next Generation Sequencing Clinical Chemistry January 1, 2015 61: 124-135

Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome Sci Transl Med September 3, 2014 6: 252ra123

Improved exome prioritization of disease genes through cross species phenotype comparison

Abstract

Articles citing this article

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

Share

Preprint Server

Current Issue

In This Issue