Improved exome prioritization of disease genes through cross species phenotype comparison
- Peter Robinson1,9,
- Sebastian Köhler2,
- Anika Oellrich3,
- Sanger Mouse Genetics Project4,
- Kai Wang5,
- Chris Mungall6,
- Suzanna E Lewis6,
- Nicole Washington6,
- Sebastian Bauer2,
- Dominik Seelow Seelow2,
- Peter Krawitz2,
- Christian Gilissen7,
- Melissa Haendel8 and
- Damian Smedley3
- 1 Charité University Hospital;
- 2 Charité-Universitätsmedizin Berlin;
- 3 Wellcome Trust Sanger Institute;
- 4 -;
- 5 Zilkha Neurogenetic Institute, University of Southern California;
- 6 Lawrence Berkeley National Laboratory;
- 7 Radboud University Nijmegen Medical Centre;
- 8 Oregon Health & Sciences University
- ↵* Corresponding author; email: peter.robinson{at}charite.de
Abstract
Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic (non-coding, not affecting splicing, synonymous or missense mutations annotated as non-pathogenic by prediction algorithms). The observation that each of our genomes contains about 100 genuine loss of function variants with ~20 genes completely inactivated makes identification of the causative mutation problematic when using these strategies alone. In some cases it may be possible to use multiple affected individuals, linkage data, identity-by-descent inference, de novo heterozygous mutations from trio analysis, or prior knowledge of affected pathways to narrow down to the causative variant. In cases where this is not possible or has proven unsuccessful we propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1 fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of over 95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.
- Received May 13, 2013.
- Accepted October 24, 2013.
- Published by Cold Spring Harbor Laboratory Press
This manuscript is Open Access.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 3.0 Unported), as described at http://creativecommons.org/licenses/by/3.0/.











