calhoun.analysis.crf.io
Class InputHandlerFile

java.lang.Object
  extended by calhoun.analysis.crf.io.InputHandlerBase
      extended by calhoun.analysis.crf.io.InputHandlerFile
All Implemented Interfaces:
InputHandler, java.io.Serializable

public class InputHandlerFile
extends InputHandlerBase

an InputHandler used when all of the input is in a single file. A single InputComponentIO is used to read the file. For training, hidden sequences are stored in a separate file whose name is related to the input file name using a FilenameMapper. The same filename mapping is used to determine the training set file name when writing out data as when reading it in. The training file is read using a TrainingSequenceIO.

For this InputHandler, the location passed is the path to the file containing the input data.

See Also:
Serialized Form

Constructor Summary
InputHandlerFile()
           
 
Method Summary
 TrainingSequenceIO getHiddenStateReader()
          gets the reader used to read in results for training data.
 InputComponentIO getInputReader()
          gets the reader used to read in input sequences.
 FilenameMapper getMapper()
          the mapper used to generate the name of the hidden sequence file from the input sequence file.
 java.util.Iterator<? extends InputSequence<?>> readInputData(java.lang.String location)
          returns the input data read from the specified location.
 java.util.List<? extends TrainingSequence<?>> readTrainingData(java.lang.String location)
           
 java.util.List<? extends TrainingSequence<?>> readTrainingData(java.lang.String location, boolean predict)
          returns the training data read from the specified location.
 void setHiddenStateReader(TrainingSequenceIO hiddenStateReader)
          sets the reader used to get hidden sequences.
 void setInputReader(InputComponentIO inputReader)
          gets the reader used to read in input sequences.
 void setMapper(FilenameMapper mapper)
          the mapper used to generate the name of the hidden sequence file from the input sequence file.
 void writeInputData(java.lang.String location, java.util.Iterator<? extends InputSequence<?>> data)
          writes input data to the specified location.
 void writeTrainingData(java.lang.String location, java.util.List<? extends TrainingSequence<?>> data)
          writes training data to the specified location.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InputHandlerFile

public InputHandlerFile()
Method Detail

readInputData

public java.util.Iterator<? extends InputSequence<?>> readInputData(java.lang.String location)
                                                             throws java.io.IOException
Description copied from interface: InputHandler
returns the input data read from the specified location. The result is returned as an Iterator because the inference algorithms can predict on the sequences one at a time. The interpretation of the location string is dependent on the particular InputHandler implementation used.

Parameters:
location - string location of the data. Meaning is implementation dependent.
Returns:
an iterator over input sequences
Throws:
java.io.IOException - if there is a problem reading the data

readTrainingData

public java.util.List<? extends TrainingSequence<?>> readTrainingData(java.lang.String location)
                                                               throws java.io.IOException
Throws:
java.io.IOException

readTrainingData

public java.util.List<? extends TrainingSequence<?>> readTrainingData(java.lang.String location,
                                                                      boolean predict)
                                                               throws java.io.IOException
Description copied from interface: InputHandler
returns the training data read from the specified location. Training data includes input data and hidden sequences. The result is returned as a Iterator so algorithms are not forced to hold all of the training data at once (although most will). The interpretation of the location string is dependent on the particular InputHandler implementation used.

Parameters:
location - string location of the data. Meaning is implementation dependent.
Returns:
a list of training sequences
Throws:
java.io.IOException - if there is a problem reading the data

writeInputData

public void writeInputData(java.lang.String location,
                           java.util.Iterator<? extends InputSequence<?>> data)
                    throws java.io.IOException
Description copied from interface: InputHandler
writes input data to the specified location. The interpretation of the location string is dependent on the particular InputHandler implementation used.

Parameters:
location - string location of the data. Meaning is implementation dependent.
data - an iterator over input sequences
Throws:
java.io.IOException - if there is a problem reading the data

writeTrainingData

public void writeTrainingData(java.lang.String location,
                              java.util.List<? extends TrainingSequence<?>> data)
                       throws java.io.IOException
Description copied from interface: InputHandler
writes training data to the specified location. Training data includes input data and hidden sequences. The interpretation of the location string is dependent on the particular InputHandler implementation used.

Parameters:
location - string location of the data. Meaning is implementation dependent.
data - a list of training sequences to write out.
Throws:
java.io.IOException - if there is a problem reading the data

getHiddenStateReader

public TrainingSequenceIO getHiddenStateReader()
gets the reader used to read in results for training data.

Returns:
the TrainingSequenceIO used to read in the hidden sequences for training

setHiddenStateReader

public void setHiddenStateReader(TrainingSequenceIO hiddenStateReader)
sets the reader used to get hidden sequences. Must be set to read in training data.

Parameters:
hiddenStateReader - the reader that will be used to access hidden states

getInputReader

public InputComponentIO getInputReader()
gets the reader used to read in input sequences. Must be set before any of the read methods are called.

Returns:
the reader used to read in input sequences.

setInputReader

public void setInputReader(InputComponentIO inputReader)
gets the reader used to read in input sequences. Must be set before any of the read methods are called.

Parameters:
inputReader - the reader used to read in input sequences.

getMapper

public FilenameMapper getMapper()
the mapper used to generate the name of the hidden sequence file from the input sequence file. Must be set to read in training data.

Returns:
the mapper used to generate the hidden sequence file name.

setMapper

public void setMapper(FilenameMapper mapper)
the mapper used to generate the name of the hidden sequence file from the input sequence file.

Parameters:
mapper - the mapper used to generate the hidden sequence file name.