SIGRS
Class SIGRSMain

java.lang.Object
  extended by SIGRS.SIGRSMain

public class SIGRSMain
extends java.lang.Object

This is the main class for the SIGRS routines.

SIGRS is a collection of routines used in searching for regions of contrasting composition (CCRs) in sequence files using a partial sum process. Significance of segments is evaluated using Karlin-Altschul statistics and specifically an extension by Karlin-Dembo allowing for nucleotides to have a Markov-dependence (see e.g. Karlin & Altschul (1993) and Karlin & Dembo (1992)

The routines are provided as is and no guarantee regarding stability etc. is given so use at your own risk!

See publication Larsson, P., Hinas, A., Ardell, D.H., Kirsebom, L.A., Virtanen, A. and Söderbom, F. De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: performance of Markov-dependent genome feature scoring

Questions and comments can be directed to Pontus.Larsson@icm.uu.se


Constructor Summary
SIGRSMain()
           
 
Method Summary
static SIGRSScoreObject getScoreObject(java.io.File backgroundFile, java.io.File targetFile, boolean symmetric)
          Creates a SIGRSScoreObject that calculates the scores and parameters from the given backgound and target sequence files
static void main(java.lang.String[] args)
          Main method.
static void scanForCCRs(java.io.File[] inputFile, int model, SIGRSScoreObject sso, double expect, int N, boolean symmetric, java.lang.String label, LogWriter lw, LogWriter out)
          Scans the input files in both directions for contrasting composition regions (CCRs)
static void writeGFFAnnotationOfCCRs(double[][] CCRs, java.lang.String[] id, int model, java.lang.String label, LogWriter lw)
          Writes the contrasting composition regions (CCRs) to output in GFF format.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SIGRSMain

public SIGRSMain()
Method Detail

main

public static void main(java.lang.String[] args)
Main method. Can be launched either to scan sequences for CCRs or just to calculate a score object. For the scan, a score object must be specified, either as saved parameters previously generated by the program (-S option) or through computations through specified background and target sequence files (-BG and -TG options). The target sequence set can be subdivided into classes based on tags in the id line of each sequence. Each class will then be weighted equally in the calculation of scores regardless of the number of sequences in each class. The classes are specified by the -C option followed by a space-separated list of tags (if spaces are present within a tag, the entire tag needs to be quoted). Input files to scan for CCRs are specified through the -IN option. Expect cutoff can be specified with the -E option. The database search space size can be manually specified with the -N option if needed. The model type to use can be specified by the -M option, where 0 means M0-model which assumes independent nucleotides and 1 means M1-model which assumes nucleotides with first-order Markov dependence. -OUT can be used to specify a file to write program output to instead of stdout. Progress output can be redirected with the -LOG option


getScoreObject

public static SIGRSScoreObject getScoreObject(java.io.File backgroundFile,
                                              java.io.File targetFile,
                                              boolean symmetric)
                                       throws java.lang.Exception
Creates a SIGRSScoreObject that calculates the scores and parameters from the given backgound and target sequence files

Parameters:
backgroundFile - A FASTA file containing sequence(s) representative of the background search space
targetFile - A FASTA file containing sequence(s) representative of the target distributions
Returns:
A SIGRSScoreObject
Throws:
java.lang.Exception

scanForCCRs

public static void scanForCCRs(java.io.File[] inputFile,
                               int model,
                               SIGRSScoreObject sso,
                               double expect,
                               int N,
                               boolean symmetric,
                               java.lang.String label,
                               LogWriter lw,
                               LogWriter out)
                        throws java.lang.Exception
Scans the input files in both directions for contrasting composition regions (CCRs)

Parameters:
inputFile - An array containing the input files in FASTA format. Each file may contain multiple sequences
model - The model type to use. 0 -> M0 (Independent nucleotides), 1 -> M1 (Markov-dependent nucleotides)
sso - A SIGRSScoreObject that holds all the necessary scores and model parameters
expect - The expect cutoff to use when reporting hits
lw - A writer to write progress to
out - A writer to write the resulting CCRs to
Throws:
java.lang.Exception

writeGFFAnnotationOfCCRs

public static void writeGFFAnnotationOfCCRs(double[][] CCRs,
                                            java.lang.String[] id,
                                            int model,
                                            java.lang.String label,
                                            LogWriter lw)
Writes the contrasting composition regions (CCRs) to output in GFF format. The tab delimited columns are as follows:

1 -> Sequence identifier
2 -> Source name
3 -> Family feature name
4 -> Start position
5 -> End position
6 -> Blank
7 -> Orientation
8 -> Expect value
9 -> Feature name
10 -> Class feature name

Parameters:
CCRs - An array holding the CCRs
id - A sequence identifier where the CCRs were found
lw - A reference to an output writer