SIGRS
Class KAStatistics

java.lang.Object
  extended by SIGRS.KAStatistics

public class KAStatistics
extends java.lang.Object

SIGRS is a collection of routines used in searching for regions of contrasting composition (CCRs) in sequence files using a partial sum process. Significance of segments is evaluated using Karlin-Altschul statistics and specifically an extension by Karlin-Dembo allowing for nucleotides to have a Markov-dependence (see e.g. Karlin & Altschul (1993) and Karlin & Dembo (1992)

The routines are provided as is and no guarantee regarding stability etc. is given so use at your own risk!

See publication Larsson, P., Hinas, A., Ardell, D.H., Kirsebom, L.A., Virtanen, A. and Söderbom, F. De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring

Questions and comments can be directed to Pontus.Larsson@icm.uu.se


Constructor Summary
KAStatistics()
           
 
Method Summary
static double cutoff(double alpha, int N, double K, double L)
          Determines the bit score cutoff at significance level alpha.
static double entropy(double[][] s, double[][] p, double L)
          Calculates the entropy of a scoring matrix
static double estimateK(double[][] s, double[][] p, double L, double H)
          Estimates the parameter K for the independant nucleotides case For details, see page 8 in BLAST scoring parameters
static double expect(double score, double N, double K, double L)
          Calculates the expect value of a score according to Karlin & Altschul (1993) p.5875 [y=K*N*exp(-lambda*x)]
static double gcd(double a, double b)
          Finds the greatest common divisor of two numbers using the Euclidian algorithm
static double lambda(double[][] s, double[][] p)
          Estimates lambda by an iterative Newton-Rhapson until convergence Terminates execution if convergence is not reached within 10000 iterations
static double[][] reshape(double[][] s, double[][] p)
          Returns a matrix spanning the scores in s where the first column of each row is a score and the second column is the total probability of observing that score
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KAStatistics

public KAStatistics()
Method Detail

cutoff

public static double cutoff(double alpha,
                            int N,
                            double K,
                            double L)
Determines the bit score cutoff at significance level alpha. According to [1] in Karlin & Altschul (1993) solved for x at Prob(S'>=x) = alpha

Parameters:
alpha - The significance level
N - The search space size
K - The parameter K
L - The parameter lambda
Returns:
The bit score cutoff at the desired significance level

entropy

public static double entropy(double[][] s,
                             double[][] p,
                             double L)
Calculates the entropy of a scoring matrix

Parameters:
s - The input score matrix
p - The probabilities associated with the score matrix
L - The estimated lambda parameter for s and p
Returns:
The calculated entropy

estimateK

public static double estimateK(double[][] s,
                               double[][] p,
                               double L,
                               double H)
Estimates the parameter K for the independant nucleotides case For details, see page 8 in BLAST scoring parameters

Parameters:
s - Score matrix
p - Probability matrix associated with scores
L - The estimated lambda
H - The calculated entropy of the score matrix
Returns:
An estimated value for the parameter K

gcd

public static double gcd(double a,
                         double b)
Finds the greatest common divisor of two numbers using the Euclidian algorithm

Parameters:
a -
b -
Returns:
The greatest common divisor of a and b

expect

public static double expect(double score,
                            double N,
                            double K,
                            double L)
Calculates the expect value of a score according to Karlin & Altschul (1993) p.5875 [y=K*N*exp(-lambda*x)]

Parameters:
score - The bit score to calculate expect value for
N - The search space size
K - The parameter K
L - The parameter lambda
Returns:
The calculated expect value for the input score

lambda

public static double lambda(double[][] s,
                            double[][] p)
Estimates lambda by an iterative Newton-Rhapson until convergence Terminates execution if convergence is not reached within 10000 iterations

Parameters:
s - Score matrix
p - Probability matrix associated with scores
Returns:
An estimated value of lambda for the score matrix

reshape

public static double[][] reshape(double[][] s,
                                 double[][] p)
Returns a matrix spanning the scores in s where the first column of each row is a score and the second column is the total probability of observing that score