Welcome to Conrad!

What is Conrad? The Conrad gene caller is tool for predicting gene structures in DNA based on the DNA sequence and other available evidence. The gene caller uses semi-Markov Conditional Random Fields. The Conrad CRF engineis a general purpose CRF engine used by the gene caller to provide the structure and algorithms for gene calling.

Conrad gene caller

Conrad project is a robust, flexible, highly accurate gene predictor capable of incorporating a wide variety of data useful for gene calling. Traditionally, gene callers have been divided into ab initio programs that predict genes based on one or more comparative sequences using a probabilistic model, or hueristic programs that assemble genes from evidence such as ESTs or BLAST hits. Conrad can combine probabilistic models with ad hoc evidence to combine the strengths of these two methods. It does this by using a different machine learning technique than traditional gene callers and employing solid software engineering principles from the ground up to ensure modularity and flexibility.

A few of the key features of the Conrad gene caller:

Better accuracy HMM gene predictors
star Pre-trained models for several organisms
star Easily trainable for single genome and comparative gene prediction
starAbility to handle ESTs, BLAST, and other data types
starWritten in Java, runs on any platform
starFlexible input and output data handling
starInterface to define custom features and algorithms to extend the gene caller
starOpen source using the LGPL license

Get started using the Conrad Gene Caller.

Conrad CRF engine

The Conrad gene caller is built on top of the Conrad CRF engine, which is a general purpose engine for working with semi-Markov linear-chain conditional random fields. The engine itself contains no gene caller specific code and can easily be applied to other domains simply by changing the XML model file. No recompile of the engine code is needed. The engine itself is extremely fast, since its development has been driven by the Conrad gene caller, which must deal with sequences with length in the millions. As a CRF engine Conrad has several important features:

star Simple interface for creating models and writing features
star Models can be reconfigured without recompiling the engine
star Uses intelligent caching strategies to achieve high performance with low memory overhead
star Supports arbitrary constraints for valid and invalid paths
star Handles Markov and semi-Markov CRFs
star Supports maximum likelihood and maximum expected accuracy training
star Can be reconfigured with new training or caching algorithms without and recompiling