GenomeVIP: a cloud platform for genomic variant discovery and interpretation

  1. Li Ding1,2,8,10
  1. 1McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA;
  2. 2Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA;
  3. 3Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Massachusetts 02142, USA;
  4. 4Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA;
  5. 5Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA;
  6. 6Langone Medical Center, New York University, New York, New York 10016, USA;
  7. 7Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912, USA;
  8. 8Department of Genetics, Washington University, St. Louis, Missouri 63108, USA;
  9. 9Department of Mathematics, Washington University, St. Louis, Missouri 63108, USA;
  10. 10Siteman Cancer Center, Washington University, St. Louis, Missouri 63108, USA
  • Corresponding author: lding{at}wustl.edu
  • Abstract

    Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional “download and analyze” paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.

    Footnotes

    • Received June 21, 2016.
    • Accepted May 3, 2017.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    Articles citing this article

    | Table of Contents

    Preprint Server