Methods

Generic eukaryotic core promoter prediction using structural features of DNA

    • 1 Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium;
    • 2 Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium;
    • 3 Laboratoire Associé de l’INRA (France), Ghent University, 9052 Gent, Belgium
Published December 20, 2007. Vol 18 Issue 2, pp. 310-323. https://doi.org/10.1101/gr.6991408
Download PDF Please log-in to or register for your personal account in order to access PDF Cite Article Permissions Share
cover of Genome Research Vol 36 Issue 4
Current Issue:

Abstract

Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, and has no size constraints, and the results are easily interpretable. We compared our approach with 14 current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on 12 additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.

Loading
Loading
Back to top