Method

BiosyntheticSPAdes: Reconstructing Biosynthetic Gene Clusters From Assembly Graphs

    • 1 Weill Cornell Medical College;
    • 2 Carnegie Mellon University;
    • 3 Wageningen University;
    • 4 Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University;
    • 5 Institute for Translational Biomedicine, St. Petersburg State University;
    • 6 University of California San Diego
Download PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 6
Current Issue:

Abstract

Predicting Biosynthetic Gene Clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGC in fragmented genomic assemblies remains challenging. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for most sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them. The situation is even more severe in shotgun metagenomics, where the contigs are often short, and the existing tools fail to predict a large fraction of long BGCs. While it is difficult to predict BGCs spanning multiple contigs, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding long BGCs. We describe biosyntheticSPAdes, a tool for predicting BGCs in assembly graphs and demonstrate that it greatly improves the reconstruction of BGCs from genomic and metagenomics datasets.

Loading
Loading
Loading
Back to top