calhoun.analysis.crf
Class Conrad

java.lang.Object
  extended by calhoun.analysis.crf.Conrad
All Implemented Interfaces:
java.io.Serializable

public class Conrad
extends java.lang.Object
implements java.io.Serializable

the central class for the Conrad engine. Has a main function for calling Conrad from the command line and a high-level public interface for programmatic operations. This class is mostly just a container which delegates the real work to the various objects set up in the configuration.

See Also:
Serialized Form

Constructor Summary
Conrad()
          creates a Conrad engine with no configuration.
Conrad(java.lang.String configFile)
          creates a Conrad engine based on configuration information from an XML model file.
 
Method Summary
 java.lang.String getFeatureName(int index)
          looks up a feature's name given it's index
 CRFInference getInference()
          returns the configured inference algorithm which will be used to predict hidden states for new inputs once the model is trained.
 InputHandler getInputHandler()
          returns the configured input handler.
 ModelManager getModel()
          returns the configured ModelManager object.
 int getNumFeatures()
          returns the number of individual features in the model.
 int getNumStates()
          returns the number of hidden states in the model
 CRFTraining getOptimizer()
          returns the configured numerical optimizer which will be used to select the optimal feature weights during training.
 OutputHandler getOutputHandler()
          gets the configured output handler.
 java.lang.String getStateName(int state)
          looks up the name of a state given it's index.
 double getTrainingTime()
          returns the the total number of seconds used in training.
 double[] getWeights()
          returns the feature weights.
static void main(java.lang.String[] args)
          Command line entry point for running CRFs.
 CRFInference.InferenceResult predict(InputSequence data)
          preforms inference on the input sequence and determines the best labeling for the sequence using the configured inference algorithm.
 java.lang.String printWeights()
          Returns a formatted string listing the weights.
static Conrad read(java.lang.String filename)
          read in a Conrad engine from a file.
 void setInference(CRFInference inference)
          sets the inference algorithm.
 void setInputHandler(InputHandler inputHandler)
          sets the configured input handler.
 void setModel(ModelManager model)
          sets the model.
 void setOptimizer(CRFTraining optimizer)
          sets the numerical optimizer.
 void setOutputHandler(OutputHandler outputHandler)
          sets the configured output handler.
 void setWeights(double[] weights)
          sets feature weights.
 void test(java.util.List<? extends TrainingSequence<?>> data)
           
 void test(java.util.List<? extends TrainingSequence<?>> data, java.lang.String location)
          runs a trained model against a set of input data with known results and evaluates the performance.
 void test(java.lang.String inputLocation)
          runs a trained model against a set of input data with known results and evaluates the performance.
 void test(java.lang.String inputLocation, java.lang.String outputLocation)
           
 void testWithoutAnswers(java.lang.String inputLocation, java.lang.String outputLocation)
           
 void train(java.util.List<? extends TrainingSequence<?>> data)
          fully trains this Conrad engine with this training data.
 void train(java.lang.String location)
          fully trains this Conrad engine with this training data.
 void trainFeatures(java.util.List<? extends TrainingSequence<?>> data)
          trains only the features in the current model with this training data.
 void trainFeatures(java.lang.String location)
          trains only the features in the current model with this training data.
 void trainWeights(java.util.List<? extends TrainingSequence<?>> data)
          optimizes the feature weights for the current model with this training data.
 void trainWeights(java.lang.String location)
          optimizes the feature weights for the current model with this training data.
 void write(java.lang.String filename)
          writes this Conrad engine to a file.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Conrad

public Conrad()
creates a Conrad engine with no configuration. All configuration must be done programmatically.


Conrad

public Conrad(java.lang.String configFile)
creates a Conrad engine based on configuration information from an XML model file.

Parameters:
configFile - string filename of the XML model file
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Command line entry point for running CRFs. Can train, test, or predict depending on arguments.

Parameters:
args - list of command line arguments. See usage for details.
Throws:
java.lang.Exception

write

public void write(java.lang.String filename)
           throws java.io.IOException
writes this Conrad engine to a file. This is most often used to save a trained model file.

Parameters:
filename - string name of the file that will contain the serialized model.
Throws:
java.io.IOException - if a problem occurs writing to the file

read

public static Conrad read(java.lang.String filename)
                   throws java.io.IOException
read in a Conrad engine from a file. This file must have previously been created by calling write(java.lang.String)

Parameters:
filename - string name of the file containing the model.
Returns:
the Conrad engine which has been read from the file
Throws:
java.io.IOException - if there is a problem reading the file

train

public void train(java.lang.String location)
           throws java.io.IOException
fully trains this Conrad engine with this training data. The training data is specified as a string location, which the configured InputHandler is responsible for converting into a list of training sequences of the appropriate type.

Parameters:
location - string location of the data. The exact meaning will be determined by the InputHandler.
Throws:
java.io.IOException - if there is a problem reading the training data.

train

public void train(java.util.List<? extends TrainingSequence<?>> data)
fully trains this Conrad engine with this training data. The training data is specified as a list of training sequences, and no DataInputHandler is used.

Parameters:
data - a list of training sequences to use for training

trainFeatures

public void trainFeatures(java.lang.String location)
                   throws java.io.IOException
trains only the features in the current model with this training data. FeatureManager.train(int, calhoun.analysis.crf.ModelManager, java.util.List>) is called for each feature in the model, but no optimization is performed and no feature weights are set. This allows the features themselves to be parameterized on one set of training data, while using a different set for optimizing the feature weights.

The training data is specified as a string location, which the configured InputHandler is responsible for converting into a list of training sequences of the appropriate type.

Parameters:
location - string location of the data. The exact meaning will be determined by the InputHandler.
Throws:
java.io.IOException - if there is a problem reading the training data

trainFeatures

public void trainFeatures(java.util.List<? extends TrainingSequence<?>> data)
trains only the features in the current model with this training data. FeatureManager.train(int, calhoun.analysis.crf.ModelManager, java.util.List>) is called for each feature in the model, but no optimization is performed and no feature weights are set. This allows the features themselves to be parameterized on one set of training data, while using a different set for optimizing the feature weights.

The training data is specified as a list of training sequences, and no DataInputHandler is used.

Parameters:
data - a list of training sequences to use for training

trainWeights

public void trainWeights(java.lang.String location)
                  throws java.io.IOException
optimizes the feature weights for the current model with this training data. Assumes that trainFeatures(java.lang.String) has already been called to train the individual features.

The training data is specified as a list of training sequences, and no DataInputHandler is used.

Parameters:
location - string location of the data. The exact meaning will be determined by the InputHandler.
Throws:
java.io.IOException - if there is a problem reading the training data

trainWeights

public void trainWeights(java.util.List<? extends TrainingSequence<?>> data)
optimizes the feature weights for the current model with this training data. Assumes that trainFeatures(java.lang.String) has already been called to train the individual features.

The training data is specified as a list of training sequences, and no DataInputHandler is used.

Parameters:
data - a list of training sequences to use for training

test

public void test(java.lang.String inputLocation)
          throws java.io.IOException
runs a trained model against a set of input data with known results and evaluates the performance. Assumes that train(java.lang.String) has already been called to train the model. For convenience, the data is passed in as a training set, although the model is not trained. The input is used to create a set of predictions and then those predictions are compared against the expected outputs. The result of the prediction is passed to the output handler which can compare the predicted versus the expected values

Parameters:
inputLocation - string location of the data. The exact meaning will be determined by the InputHandler.
Throws:
java.io.IOException - if there is a problem reading the training data

test

public void test(java.lang.String inputLocation,
                 java.lang.String outputLocation)
          throws java.io.IOException
Throws:
java.io.IOException

test

public void test(java.util.List<? extends TrainingSequence<?>> data)
          throws java.io.IOException
Throws:
java.io.IOException

testWithoutAnswers

public void testWithoutAnswers(java.lang.String inputLocation,
                               java.lang.String outputLocation)
                        throws java.io.IOException
Throws:
java.io.IOException

test

public void test(java.util.List<? extends TrainingSequence<?>> data,
                 java.lang.String location)
          throws java.io.IOException
runs a trained model against a set of input data with known results and evaluates the performance. Assumes that train(java.lang.String) has already been called to train the model. For convenience, the data is passed in as a training set, although the model is not trained. The input is used to create a set of predictions and then those predictions are compared against the expected outputs. The result of the prediction is passed to the output handler which can compare the predicted versus the expected values

Parameters:
data - a list of training sequences to use for training
Throws:
java.io.IOException

predict

public CRFInference.InferenceResult predict(InputSequence data)
preforms inference on the input sequence and determines the best labeling for the sequence using the configured inference algorithm.

Parameters:
data - the input sequence the engine will use for inference
Returns:
an inference result containing the predicted hidden states

setWeights

public void setWeights(double[] weights)
sets feature weights. Usually these weights are determined during the training process, but they can be set directly.

Parameters:
weights - an array of doubles containing one weight for each feature.

getFeatureName

public java.lang.String getFeatureName(int index)
looks up a feature's name given it's index

Parameters:
index - index of the feature
Returns:
name of the feature

getTrainingTime

public double getTrainingTime()
returns the the total number of seconds used in training. This is the sum of the time to train the features and the time to train the weights. This is set when each phase of the training (features & weights) is completed.

Returns:
total training time

getNumFeatures

public int getNumFeatures()
returns the number of individual features in the model. This may differ from the number of FeatureManagers because each FeatureManager may have 0, 1, or many features associated with it.

Returns:
total number of features in the model.

getNumStates

public int getNumStates()
returns the number of hidden states in the model

Returns:
number of hidden states in the model

getStateName

public java.lang.String getStateName(int state)
looks up the name of a state given it's index.

Returns:
string name of the state with this index.

getModel

public ModelManager getModel()
returns the configured ModelManager object.

Returns:
the model manager which contains the features and hidden state configuration

getOptimizer

public CRFTraining getOptimizer()
returns the configured numerical optimizer which will be used to select the optimal feature weights during training.

Returns:
the configured objective function gradient

getWeights

public double[] getWeights()
returns the feature weights. These will be valid once the modle is trained.

Returns:
an array of doubles containing the weight for each feature. It will be the same length as returned by getNumFeatures()

getInference

public CRFInference getInference()
returns the configured inference algorithm which will be used to predict hidden states for new inputs once the model is trained.

Returns:
the configured inference algorithm

setInference

public void setInference(CRFInference inference)
sets the inference algorithm. Called automatically during configuration.


setModel

public void setModel(ModelManager model)
sets the model. Called automatically during configuration.


setOptimizer

public void setOptimizer(CRFTraining optimizer)
sets the numerical optimizer. Called automatically during configuration.


printWeights

public java.lang.String printWeights()
Returns a formatted string listing the weights. Useful for debugging.

Returns:
a string containing a human readable list of the feature weights

getInputHandler

public InputHandler getInputHandler()
returns the configured input handler.

Returns:
the input handler for this model

setInputHandler

public void setInputHandler(InputHandler inputHandler)
sets the configured input handler. Must be set before any train or test methods are called. Usually called during config based on setup in the XML file.

Parameters:
inputHandler - the input handler for this model

getOutputHandler

public OutputHandler getOutputHandler()
gets the configured output handler. Must be set before any test methods are called.

Returns:
Returns the outputHandler.

setOutputHandler

public void setOutputHandler(OutputHandler outputHandler)
sets the configured output handler. WillMust be set before any test methods are called.

Parameters:
outputHandler - the output handler for this model