de.unibi.techfak.jpredictor.motifs
Class PSSMotif

java.lang.Object
  extended by de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
      extended by de.unibi.techfak.jpredictor.motifs.Motif
          extended by de.unibi.techfak.jpredictor.motifs.PSPMotif
              extended by de.unibi.techfak.jpredictor.motifs.PSSMotif
All Implemented Interfaces:
MotifSearcher, MotifSearchWithThreshold, SequenceWindowScorer, Markable, java.lang.Cloneable, java.lang.Comparable

public class PSSMotif
extends PSPMotif

Contains a motif represented by scores for every base in every position (PSSM = position specific score matrix). When this motif is searched the threshold is taken into account after summing-up the scores for all positions. If the threshold is less then the sum, the motif was found.


Nested Class Summary
 
Nested classes/interfaces inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
MotifSearchAdapter.SearchFields
 
Field Summary
protected  double[] generatingBackground
          The background used to calculate the log-odd-scores in this PSSM motif.
private  double[] maxc
          Holds the maximum score, that can be reached with the next compares.
private  double[] maxn
          Holds the maximum score, that can be reached with the next compares.
private  double[] tmpBackground
          The background character distribution.
private  double[] tmpRevLookaheadMax
          An array with the sums over all columns maxima.
private  double[] tmpRevLookaheadMin
          An array with the sums over all columns minima.
 
Fields inherited from class de.unibi.techfak.jpredictor.motifs.PSPMotif
A, C, G, generatingSequences, motif, position, T, threshold
 
Fields inherited from class de.unibi.techfak.jpredictor.motifs.Motif
DNA_DEGENERATED_CODE, DNA_DEGENERATED_CODE_JOINING, MARK_USABLE
 
Fields inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
searchFields
 
Fields inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher
SEARCH_ALL_ORIENTATIONS, SEARCH_DIRECTION_MAX, SEARCH_DIRECTION_MINUS, SEARCH_DIRECTION_PLUS
 
Fields inherited from interface de.unibi.techfak.misc.Markable
MARK_BASIC, MARK_DELETED, MARK_EXP, MARK_MOVED, MARK_REPLACED, MARK_SELECTED, MARK_TOBEDELETED, MARK_TOBEMOVED, MARK_TOBEREPLACED
 
Constructor Summary
PSSMotif(java.lang.String name, java.lang.String description)
          Inits the name and the description.
 
Method Summary
static double[][] calculateScores(double[][] weight, double[] background, double add)
           Calculates the scores from probabilities and background.
 java.lang.Object clone()
           Clones the motif by creating the same motif again and copying all importent fields.
static PSSMotif createPSSMotifFromPSPMotif(PSPMotif pspm, double add, double[] distr)
           Generates a PSSMotif from a PSPMotif.
 double[] getGeneratingBackground()
          Returns the background, this PSSM motif was calculated from.
 double getMaximum()
          Calculates the maximal possible score.
 double getMinimum()
          Calculates the minimal possible score.
private  double rScoreProbability(double score, int index)
           Recursion to be called by method scoreProbability.
 double scoreProbability(double threshold, double[] background)
           Calculates for a given threshold the probability that the motif reaches this threshold when it is searched on a sequence.
 FoundMotifStruct search(int seqStart, int seqLength)
          Runs through the sequence already given in the initialization method initSearch and matches the motif on this CharSequence with respect to the search mode and the search window determined by the two parameters seqStart and seqWidth.
static double[][] testWeigthMatrix(double[][] weight, boolean change)
          The matrix must be of format [rows][columns].
 
Methods inherited from class de.unibi.techfak.jpredictor.motifs.PSPMotif
compareTo, createPSPMFromMotifBlock, createPSPMotifFromSequenceMotif, createReversedComplement, getConsensusSequence, getGeneratingSequences, getMaximum, getMinimum, getMotif, getMotifPositions, getRegularExpression, getThreshold, length, match, matchComplement, print, readMotifBlockFromFile, readPSBlockFromFile, setMotif, setThreshold
 
Methods inherited from class de.unibi.techfak.jpredictor.motifs.Motif
clearMark, clearMark, equals, getDescription, getMark, getName, getWeight, isMarked, isMarked, print, scoreSequenceWindow, setDescription, setMark, setName, setWeight, toString
 
Methods inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
getSearchMode, initSearch, searchAll, setSearchMode
 
Methods inherited from class java.lang.Object
finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher
getSearchMode, initSearch, searchAll, setSearchMode
 

Field Detail

generatingBackground

protected double[] generatingBackground
The background used to calculate the log-odd-scores in this PSSM motif. This value is only set if the motif was generated through a call to createPSSMotifFromPSPMotif.


maxn

private double[] maxn
Holds the maximum score, that can be reached with the next compares. This is the field for the plus direction.


maxc

private double[] maxc
Holds the maximum score, that can be reached with the next compares. This is the field for the minus (reversed complementary) direction.


tmpBackground

private double[] tmpBackground
The background character distribution. It is set by method scoreProbability and used by method rScoreProbability . It is stored outside the function which uses it for performance reasons, otherwise it would be a parameter.


tmpRevLookaheadMax

private double[] tmpRevLookaheadMax
An array with the sums over all columns maxima. It is set by method scoreProbability and used by method rScoreProbability. It is stored outside the function which uses it for performance reasons, otherwise it would be a parameter. The formula behind is tmpRevLookaheadMax[i] = sum_over_k{0}{i} this.getMaximum(k).

See Also:
getMaximum()

tmpRevLookaheadMin

private double[] tmpRevLookaheadMin
An array with the sums over all columns minima. It is set by method scoreProbability and used by method rScoreProbability. It is stored outside the function which uses it for performance reasons, otherwise it would be a parameter. The formula behind is tmpRevLookaheadMin[i] = sum_over_k{0}{i} this.getMinimum(k).

See Also:
getMinimum()
Constructor Detail

PSSMotif

public PSSMotif(java.lang.String name,
                java.lang.String description)
Inits the name and the description.

Parameters:
name - The identifier of the motif.
description - A short description.
Method Detail

getGeneratingBackground

public double[] getGeneratingBackground()
Returns the background, this PSSM motif was calculated from. Can be null, in this case it is unknown, how the scores were calculated.

Returns:
Returns the background, this PSSM motif was calculated from.

getMaximum

public double getMaximum()
Calculates the maximal possible score. This is done for the whole motif by summing up over all positional scores returned by calling getMaximum(int).

Overrides:
getMaximum in class PSPMotif
Returns:
The maximal score, Double.NaN in case of error.
See Also:
PSPMotif.getMaximum(int)

getMinimum

public double getMinimum()
Calculates the minimal possible score. This is done for the whole motif by summing up over all positional scores returned by calling getMinimum(int).

Overrides:
getMinimum in class PSPMotif
Returns:
The minimal score, Double.NaN in case of error.
See Also:
PSPMotif.getMinimum(int)

search

public FoundMotifStruct search(int seqStart,
                               int seqLength)
                        throws MissingCharSequenceException,
                               MissingMotifException
Description copied from interface: MotifSearcher
Runs through the sequence already given in the initialization method initSearch and matches the motif on this CharSequence with respect to the search mode and the search window determined by the two parameters seqStart and seqWidth.
The first occurance of the motif on the sequence is returned as FoundMotifStruct with positions absolute to the sequence window. Note, that this method verifies no previous searches, instead it searches always new. If the motif was not found, null is returned.
The id of the returned struct is misused as a flag determining the search direction (search mode) the found motif was (any combination of SEARCH_DIRECTION_PLUS and SEARCH_DIRECTION_MINUS). Note, that it is possible, that a motif can occur more than once at the same position, e.g. the regular expression motif CNGCCATNDNND and its reversed complemented part HNNHNATGGCTG can both be matched on the sequence CGGCCATGGCTG. In case both search direction occur, the motif start and end position is always for plus direction. For a reversed complemented motif end is less than start.
If an error occured reading the sequence or if this method tried to read further than the sequence's size or if the motif could not be found on the sequence, null is returned.

Specified by:
search in interface MotifSearcher
Overrides:
search in class PSPMotif
Parameters:
seqStart - Search starts with this index.
seqLength - The width of the subsequence to search on.
Returns:
The first occurance of the motif on the subsequence or null if the motif could not be found on the sequence.
Throws:
MissingCharSequenceException - If no sequence to search on was set.
MissingMotifException - If no motif to search for was set.
See Also:
MotifSearcher.initSearch(CharSequence), MotifSearcher.SEARCH_DIRECTION_PLUS, MotifSearcher.SEARCH_DIRECTION_MINUS

testWeigthMatrix

public static double[][] testWeigthMatrix(double[][] weight,
                                          boolean change)
The matrix must be of format [rows][columns]. It must not be null, must have at least one row and at least 4 columns. No other constraints are tested.

Parameters:
weight - The matrix to test.
change - This parameter is ignored.
Returns:
The matrix itself, iff the constraints are fulfilled, otherwise null.

createPSSMotifFromPSPMotif

public static PSSMotif createPSSMotifFromPSPMotif(PSPMotif pspm,
                                                  double add,
                                                  double[] distr)

Generates a PSSMotif from a PSPMotif. The probabilities used in the PSPMotif are re-calculated to log-odd-scores using PSSMotif.calculateScores(double[][], double[], double). If no background distribution distr is given, it is set to { 0.25, 0.25, 0.25, 0.25 }.

The original motifs line-by-line positions are taken for the new motif, the same fits for the name and the score. The description is slightly changed by parenthesing the original description and setting the string "PSS" befor it. The threshold is set to the maximum. For the new motif the generating sequences and background are stored.

Parameters:
pspm - The source motif.
add - A small value to prevent the score to become -Infinity.
distr - The background distribution as a double array. If it is null, { 0.25, 0.25, 0.25, 0.25 } is taken. Note, that in any communicator exist a predefined background distribution.
Returns:
A newly created PSSMotif, null in case pspm==null.
See Also:
calculateScores(double[][], double[], double), ICommunicator.getGlobalBackground()

calculateScores

public static double[][] calculateScores(double[][] weight,
                                         double[] background,
                                         double add)

Calculates the scores from probabilities and background. The weight matrix should be in the form double[rows][cols] and the background should be in the form double[cols]. The column size of weight should be equal to the length of the background array, otherwise the lesser is taken.

The formula to calculate the log-odd-scores is

log( weights[i][j] / background[j] + add )
. add is assumed to be very small and by adding it before the logarithm is taken should be ensured, that minus infinity is not a valid result. Nevertheless, add can be zero. If add is either Double.Infinity or Double.NaN it is set to zero.

Note, that it is not checked, whether the sum over the background distribution or the sum over the weights rows is one. If a value in the background array is zero, all quotients (see above formula) are set to zero. If the quotient is less than zero, it is also set to zero.

This method works in-place, thus the weight matrix is overwritten.

Parameters:
weight - The matrix of probabilities to be recalculated to scores.
background - The background probabilities. If it is null weight is returned. Note, that in any communicator exist a predefined background distribution.
add - A small value added to every quotient befor log-calculation.
Returns:
The given weight matrix with recalculated content, or null if weight is null.
See Also:
ICommunicator.getGlobalBackground()

clone

public java.lang.Object clone()
Description copied from class: Motif

Clones the motif by creating the same motif again and copying all importent fields. The result is at least of class Motif. When cloning a MultiMotif the referred single motifs are not cloned.

The fields that are copied are: Name, Description, the Motif itself, errorNumberAllowedForMatch or Threshold, respectively, Weight and searchMode (altogether 6 fields by now). What is not copied are the fields of the MotifSearchAdapter.

Note, that any matrices are not cloned. Thus, if cloning a PSPMotif or a PSSMotif the cloned motif does contain a reference to the matrix of the old motif. This was done to preserve memory.

Overrides:
clone in class PSPMotif
Returns:
The Motif itself as a doubled copy.
See Also:
Object.clone()

rScoreProbability

private double rScoreProbability(double score,
                                 int index)

Recursion to be called by method scoreProbability. Any function calling this method must ensure, that tmpBackground , tmpRevLookaheadMax and tmpRevLookaheadMin are set properly. Normally, this method is called with a threshold, which must be exceeded and length of this motif minus one.

Parameters:
score - The score the motif must reach (from right to left).
index - Position where we are in the motif.
Returns:
The probability for this motif to exceed the given threshold.

scoreProbability

public double scoreProbability(double threshold,
                               double[] background)

Calculates for a given threshold the probability that the motif reaches this threshold when it is searched on a sequence.

For all sequences that exceed the threshold the probabilities to occur (calculated from the given background character distribution) are added.

The dynamic programming recursion was introduced by Wu et al in 2000. It is extended by two lookahead abortion conditions, one for exceeding the actual score (look-ahead minimum), one for not reaching that score any more (look-ahead maximum).

Parameters:
threshold - The threshold to calculate with.
background - The character occurance frequencies.
Returns:
The probability for this motif to exceed the given threshold.