|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectde.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
de.unibi.techfak.jpredictor.motifs.Motif
de.unibi.techfak.jpredictor.motifs.PSPMotif
de.unibi.techfak.jpredictor.motifs.PSSMotif
public class PSSMotif
Contains a motif represented by scores for every base in every position (PSSM = position specific score matrix). When this motif is searched the threshold is taken into account after summing-up the scores for all positions. If the threshold is less then the sum, the motif was found.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter |
|---|
MotifSearchAdapter.SearchFields |
| Field Summary | |
|---|---|
protected double[] |
generatingBackground
The background used to calculate the log-odd-scores in this PSSM motif. |
private double[] |
maxc
Holds the maximum score, that can be reached with the next compares. |
private double[] |
maxn
Holds the maximum score, that can be reached with the next compares. |
private double[] |
tmpBackground
The background character distribution. |
private double[] |
tmpRevLookaheadMax
An array with the sums over all columns maxima. |
private double[] |
tmpRevLookaheadMin
An array with the sums over all columns minima. |
| Fields inherited from class de.unibi.techfak.jpredictor.motifs.PSPMotif |
|---|
A, C, G, generatingSequences, motif, position, T, threshold |
| Fields inherited from class de.unibi.techfak.jpredictor.motifs.Motif |
|---|
DNA_DEGENERATED_CODE, DNA_DEGENERATED_CODE_JOINING, MARK_USABLE |
| Fields inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter |
|---|
searchFields |
| Fields inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher |
|---|
SEARCH_ALL_ORIENTATIONS, SEARCH_DIRECTION_MAX, SEARCH_DIRECTION_MINUS, SEARCH_DIRECTION_PLUS |
| Fields inherited from interface de.unibi.techfak.misc.Markable |
|---|
MARK_BASIC, MARK_DELETED, MARK_EXP, MARK_MOVED, MARK_REPLACED, MARK_SELECTED, MARK_TOBEDELETED, MARK_TOBEMOVED, MARK_TOBEREPLACED |
| Constructor Summary | |
|---|---|
PSSMotif(java.lang.String name,
java.lang.String description)
Inits the name and the description. |
|
| Method Summary | |
|---|---|
static double[][] |
calculateScores(double[][] weight,
double[] background,
double add)
Calculates the scores from probabilities and background. |
java.lang.Object |
clone()
Clones the motif by creating the same motif again and copying all importent fields. |
static PSSMotif |
createPSSMotifFromPSPMotif(PSPMotif pspm,
double add,
double[] distr)
Generates a PSSMotif from a PSPMotif. |
double[] |
getGeneratingBackground()
Returns the background, this PSSM motif was calculated from. |
double |
getMaximum()
Calculates the maximal possible score. |
double |
getMinimum()
Calculates the minimal possible score. |
private double |
rScoreProbability(double score,
int index)
Recursion to be called by method scoreProbability. |
double |
scoreProbability(double threshold,
double[] background)
Calculates for a given threshold the probability that the motif reaches this threshold when it is searched on a sequence. |
FoundMotifStruct |
search(int seqStart,
int seqLength)
Runs through the sequence already given in the initialization method initSearch and matches the motif on this
CharSequence with respect to the search mode and the
search window determined by the two parameters seqStart
and seqWidth. |
static double[][] |
testWeigthMatrix(double[][] weight,
boolean change)
The matrix must be of format [rows][columns]. |
| Methods inherited from class de.unibi.techfak.jpredictor.motifs.PSPMotif |
|---|
compareTo, createPSPMFromMotifBlock, createPSPMotifFromSequenceMotif, createReversedComplement, getConsensusSequence, getGeneratingSequences, getMaximum, getMinimum, getMotif, getMotifPositions, getRegularExpression, getThreshold, length, match, matchComplement, print, readMotifBlockFromFile, readPSBlockFromFile, setMotif, setThreshold |
| Methods inherited from class de.unibi.techfak.jpredictor.motifs.Motif |
|---|
clearMark, clearMark, equals, getDescription, getMark, getName, getWeight, isMarked, isMarked, print, scoreSequenceWindow, setDescription, setMark, setName, setWeight, toString |
| Methods inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter |
|---|
getSearchMode, initSearch, searchAll, setSearchMode |
| Methods inherited from class java.lang.Object |
|---|
finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher |
|---|
getSearchMode, initSearch, searchAll, setSearchMode |
| Field Detail |
|---|
protected double[] generatingBackground
createPSSMotifFromPSPMotif.
private double[] maxn
private double[] maxc
private double[] tmpBackground
scoreProbability and used by method rScoreProbability
. It is stored outside the function which uses it for
performance reasons, otherwise it would be a parameter.
private double[] tmpRevLookaheadMax
scoreProbability and used by method
rScoreProbability. It is stored outside the function which
uses it for performance reasons, otherwise it would be a parameter.
The formula behind is tmpRevLookaheadMax[i] = sum_over_k{0}{i}
this.getMaximum(k).
getMaximum()private double[] tmpRevLookaheadMin
scoreProbability and used by method
rScoreProbability. It is stored outside the function which
uses it for performance reasons, otherwise it would be a parameter.
The formula behind is tmpRevLookaheadMin[i] = sum_over_k{0}{i}
this.getMinimum(k).
getMinimum()| Constructor Detail |
|---|
public PSSMotif(java.lang.String name,
java.lang.String description)
name - The identifier of the motif.description - A short description.| Method Detail |
|---|
public double[] getGeneratingBackground()
null, in this case it is unknown, how the scores were
calculated.
public double getMaximum()
getMaximum(int).
getMaximum in class PSPMotifDouble.NaN in case of error.- See Also:
PSPMotif.getMaximum(int)public double getMinimum()
getMinimum(int).
getMinimum in class PSPMotifDouble.NaN in case of error.- See Also:
PSPMotif.getMinimum(int)
public FoundMotifStruct search(int seqStart,
int seqLength)
throws MissingCharSequenceException,
MissingMotifException
MotifSearcherinitSearch and matches the motif on this
CharSequence with respect to the search mode and the
search window determined by the two parameters seqStart
and seqWidth.
FoundMotifStruct with positions absolute to the
sequence window. Note, that this method verifies no previous
searches, instead it searches always new. If the motif was not
found, null is returned.
id of the returned struct is misused as a flag
determining the search direction (search mode) the found motif was
(any combination of SEARCH_DIRECTION_PLUS and
SEARCH_DIRECTION_MINUS). Note, that it is possible,
that a motif can occur more than once at the same position, e.g. the
regular expression motif CNGCCATNDNND and its reversed complemented
part HNNHNATGGCTG can both be matched on the sequence CGGCCATGGCTG.
In case both search direction occur, the motif start and end position
is always for plus direction. For a reversed complemented motif
end is less than start.
null is returned.
search in interface MotifSearchersearch in class PSPMotifseqStart - Search starts with this index.seqLength - The width of the subsequence to search on.
null if the motif could not be found on the sequence.
MissingCharSequenceException - If no sequence to search on was
set.
MissingMotifException - If no motif to search for was set.MotifSearcher.initSearch(CharSequence),
MotifSearcher.SEARCH_DIRECTION_PLUS,
MotifSearcher.SEARCH_DIRECTION_MINUS
public static double[][] testWeigthMatrix(double[][] weight,
boolean change)
[rows][columns]. It must
not be null, must have at least one row and at least 4
columns. No other constraints are tested.
weight - The matrix to test.change - This parameter is ignored.
null.
public static PSSMotif createPSSMotifFromPSPMotif(PSPMotif pspm,
double add,
double[] distr)
Generates a PSSMotif from a PSPMotif. The probabilities used in the
PSPMotif are re-calculated to log-odd-scores using
PSSMotif.calculateScores(double[][], double[], double).
If no background distribution distr is given, it is
set to { 0.25, 0.25, 0.25, 0.25 }.
The original motifs line-by-line positions are taken for the new motif, the same fits for the name and the score. The description is slightly changed by parenthesing the original description and setting the string "PSS" befor it. The threshold is set to the maximum. For the new motif the generating sequences and background are stored.
pspm - The source motif.add - A small value to prevent the score to become -Infinity.distr - The background distribution as a double array. If it
is null, { 0.25, 0.25, 0.25, 0.25 } is taken. Note, that
in any communicator exist a predefined background distribution.
PSSMotif, null
in case pspm==null.calculateScores(double[][], double[], double),
ICommunicator.getGlobalBackground()
public static double[][] calculateScores(double[][] weight,
double[] background,
double add)
Calculates the scores from probabilities and background. The weight
matrix should be in the form double[rows][cols] and
the background should be in the form double[cols]. The
column size of weight should be equal to the length of
the background array, otherwise the lesser is taken.
The formula to calculate the log-odd-scores is
log( weights[i][j] / background[j] + add ).
add is assumed to be very small and by adding it before
the logarithm is taken should be ensured, that minus infinity
is not a valid result. Nevertheless, add can be zero.
If add is either Double.Infinity or
Double.NaN it is set to zero.
Note, that it is not checked, whether the sum over the background
distribution or the sum over the weights rows is one.
If a value in the background array is zero, all quotients (see above
formula) are set to zero. If the quotient is less than zero, it is
also set to zero.
This method works in-place, thus the weight matrix is overwritten.
weight - The matrix of probabilities to be recalculated to
scores.background - The background probabilities. If it is null
weight is returned. Note, that in any communicator
exist a predefined background distribution.add - A small value added to every quotient befor log-calculation.
weight matrix with recalculated
content, or null if weight is null.ICommunicator.getGlobalBackground()public java.lang.Object clone()
Motif
Clones the motif by creating the same motif again and copying all
importent fields. The result is at least of class Motif.
When cloning a MultiMotif the referred single motifs
are not cloned.
The fields that are copied are: Name, Description, the Motif itself,
errorNumberAllowedForMatch or Threshold, respectively, Weight and
searchMode (altogether 6 fields by now). What is not copied are the
fields of the MotifSearchAdapter.
Note, that any matrices are not cloned. Thus, if cloning a
PSPMotif or a PSSMotif the cloned motif does
contain a reference to the matrix of the old motif. This was done
to preserve memory.
clone in class PSPMotifObject.clone()
private double rScoreProbability(double score,
int index)
Recursion to be called by method scoreProbability. Any
function calling this method must ensure, that tmpBackground
, tmpRevLookaheadMax and tmpRevLookaheadMin
are set properly. Normally, this method is called with a
threshold, which must be exceeded and length of this motif minus one.
score - The score the motif must reach (from right to left).index - Position where we are in the motif.
public double scoreProbability(double threshold,
double[] background)
Calculates for a given threshold the probability that the motif reaches this threshold when it is searched on a sequence.
For all sequences that exceed the threshold the probabilities to occur (calculated from the given background character distribution) are added.
The dynamic programming recursion was introduced by Wu et al in 2000. It is extended by two lookahead abortion conditions, one for exceeding the actual score (look-ahead minimum), one for not reaching that score any more (look-ahead maximum).
threshold - The threshold to calculate with.background - The character occurance frequencies.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||