What is Conrad? The Conrad gene caller is tool for predicting gene structures in DNA based on the DNA sequence and other available evidence. The gene caller uses semi-Markov Conditional Random Fields. The Conrad CRF engineis a general purpose CRF engine used by the gene caller to provide the structure and algorithms for gene calling.
Conrad project is a robust, flexible, highly accurate gene predictor capable of incorporating a wide variety of data useful for gene calling. Traditionally, gene callers have been divided into ab initio programs that predict genes based on one or more comparative sequences using a probabilistic model, or hueristic programs that assemble genes from evidence such as ESTs or BLAST hits. Conrad can combine probabilistic models with ad hoc evidence to combine the strengths of these two methods. It does this by using a different machine learning technique than traditional gene callers and employing solid software engineering principles from the ground up to ensure modularity and flexibility.
A few of the key features of the Conrad gene caller:
Better accuracy HMM gene predictors
Pre-trained models for several organisms
Easily trainable for single genome and comparative gene prediction
Ability to handle ESTs, BLAST, and other data types
Written in Java, runs on any platform
Flexible input and output data handling
Interface to define custom features and algorithms to extend the gene caller
Open source using the LGPL license
Get started using the Conrad Gene Caller.
The Conrad gene caller is built on top of the Conrad CRF engine, which is a general purpose engine for working with semi-Markov linear-chain conditional random fields. The engine itself contains no gene caller specific code and can easily be applied to other domains simply by changing the XML model file. No recompile of the engine code is needed. The engine itself is extremely fast, since its development has been driven by the Conrad gene caller, which must deal with sequences with length in the millions. As a CRF engine Conrad has several important features:
Simple interface for creating models and writing features
Models can be reconfigured without recompiling the engine
Uses intelligent caching strategies to achieve high performance with low memory overhead
Supports arbitrary constraints for valid and invalid paths
Handles Markov and semi-Markov CRFs
Supports maximum likelihood and maximum expected accuracy training
Can be reconfigured with new training or caching algorithms without and recompiling