calhoun.analysis.crf.features.tricycle13
Class KmerFeatures

java.lang.Object
  extended by calhoun.analysis.crf.AbstractFeatureManager<java.lang.Character>
      extended by calhoun.analysis.crf.features.tricycle13.KmerFeatures
All Implemented Interfaces:
FeatureManager<java.lang.Character>, FeatureManagerNode<java.lang.Character>, java.io.Serializable

public class KmerFeatures
extends AbstractFeatureManager<java.lang.Character>
implements FeatureManagerNode<java.lang.Character>

trains on the data and then evaluates to P(state | label) for given Kmers. Used for historical reasons. Emission markov generally does better.

See Also:
Serialized Form

Nested Class Summary
static class KmerFeatures.Cardinality
           
 
Constructor Summary
KmerFeatures()
           
KmerFeatures(KmerFeatures.Cardinality cardinality)
           
KmerFeatures(java.util.List<int[]> kmerDefs)
           
KmerFeatures(java.util.List<int[]> kmerDefs, KmerFeatures.Cardinality cardinality)
           
 
Method Summary
 void evaluateNode(InputSequence<? extends java.lang.Character> seq, int pos, int state, FeatureList result)
          Evaluates the set of features managed by this object for the given arguments.
 java.lang.String getFeatureName(int featureIndex)
          Returns a human identifiable name for the feature referenced by a given index.
 java.lang.String getKmer(InputSequence<? extends java.lang.Character> seq, int pos, int[] def)
           
 double getKmerProb(int kmerIndex, java.lang.String kmer, int label)
          Returns an individual entry from the counts list.
 int getNumFeatures()
          Returns the number of features maintained by this FeatureManager.
 java.lang.String kmerName(int index)
          Returns a string representation of a given kmer definition
 void setKmerDefinitions(java.util.List<java.util.List<java.lang.Integer>> defs)
           
 void setRareThreshold(int threshold)
           
 void train(int startingIndex, ModelManager modelInfo, java.util.List<? extends TrainingSequence<? extends java.lang.Character>> data)
          Computes the P(label | kmer) for each kmer across all of the training data.
 
Methods inherited from class calhoun.analysis.crf.AbstractFeatureManager
getCacheStrategy, getInputComponent, setInputComponent
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface calhoun.analysis.crf.FeatureManager
getCacheStrategy, getInputComponent, setInputComponent
 

Constructor Detail

KmerFeatures

public KmerFeatures(KmerFeatures.Cardinality cardinality)

KmerFeatures

public KmerFeatures()

KmerFeatures

public KmerFeatures(java.util.List<int[]> kmerDefs)

KmerFeatures

public KmerFeatures(java.util.List<int[]> kmerDefs,
                    KmerFeatures.Cardinality cardinality)
Method Detail

setKmerDefinitions

public void setKmerDefinitions(java.util.List<java.util.List<java.lang.Integer>> defs)

setRareThreshold

public void setRareThreshold(int threshold)

getNumFeatures

public int getNumFeatures()
Description copied from interface: FeatureManager
Returns the number of features maintained by this FeatureManager. This number must be fixed after the call to trainFeatures is complete.

Specified by:
getNumFeatures in interface FeatureManager<java.lang.Character>
Returns:
number of features managed by this FeatureManager

getFeatureName

public java.lang.String getFeatureName(int featureIndex)
Description copied from interface: FeatureManager
Returns a human identifiable name for the feature referenced by a given index. Used for display purposes only.

Specified by:
getFeatureName in interface FeatureManager<java.lang.Character>
Parameters:
featureIndex - the index of this feature
Returns:
the human readable name of this feature

evaluateNode

public void evaluateNode(InputSequence<? extends java.lang.Character> seq,
                         int pos,
                         int state,
                         FeatureList result)
Description copied from interface: FeatureManagerNode
Evaluates the set of features managed by this object for the given arguments.

Specified by:
evaluateNode in interface FeatureManagerNode<java.lang.Character>

getKmer

public java.lang.String getKmer(InputSequence<? extends java.lang.Character> seq,
                                int pos,
                                int[] def)

kmerName

public java.lang.String kmerName(int index)
Returns a string representation of a given kmer definition


getKmerProb

public double getKmerProb(int kmerIndex,
                          java.lang.String kmer,
                          int label)
Returns an individual entry from the counts list.


train

public void train(int startingIndex,
                  ModelManager modelInfo,
                  java.util.List<? extends TrainingSequence<? extends java.lang.Character>> data)
Computes the P(label | kmer) for each kmer across all of the training data. These will used as features values.

Specified by:
train in interface FeatureManager<java.lang.Character>
Parameters:
startingIndex - the feature index of the first feature owned by this FeatureManager. Each FeatureManager must use up consecutive indexes, so the last index used will be startingIndex + numFeatures - 1.
modelInfo - the model that contains this feature
data - the full list of training sequences to use to train the feature