de.unibi.techfak.jpredictor.motifs
Class PSPMotif

java.lang.Object
  extended by de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
      extended by de.unibi.techfak.jpredictor.motifs.Motif
          extended by de.unibi.techfak.jpredictor.motifs.PSPMotif
All Implemented Interfaces:
MotifSearcher, MotifSearchWithThreshold, SequenceWindowScorer, Markable, java.lang.Cloneable, java.lang.Comparable
Direct Known Subclasses:
PSSMotif

public class PSPMotif
extends Motif
implements MotifSearchWithThreshold

Contains a motif represented by probabilities for every base in every position (PSPM = position specific probability matrix). When this motif is searched the threshold is taken into account after multiplying the probabilities for all positions. If the threshold is less then the product, the motif was found.


Nested Class Summary
 
Nested classes/interfaces inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
MotifSearchAdapter.SearchFields
 
Field Summary
(package private) static int A
          Constant for easy access to the columns of this.motif.
(package private) static int C
          Constant for easy access to the columns of this.motif.
(package private) static int G
          Constant for easy access to the columns of this.motif.
protected  java.lang.String[] generatingSequences
          The sequences this motif was generated from.
protected  double[][] motif
          The motif stored is of format [rows][cols].
protected  int[] position
          The positional information about the motif.
(package private) static int T
          Constant for easy access to the columns of this.motif.
protected  double threshold
          The threshold probability used to decide whether a motif is a match or not.
 
Fields inherited from class de.unibi.techfak.jpredictor.motifs.Motif
DNA_DEGENERATED_CODE, DNA_DEGENERATED_CODE_JOINING, MARK_USABLE
 
Fields inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
searchFields
 
Fields inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher
SEARCH_ALL_ORIENTATIONS, SEARCH_DIRECTION_MAX, SEARCH_DIRECTION_MINUS, SEARCH_DIRECTION_PLUS
 
Fields inherited from interface de.unibi.techfak.misc.Markable
MARK_BASIC, MARK_DELETED, MARK_EXP, MARK_MOVED, MARK_REPLACED, MARK_SELECTED, MARK_TOBEDELETED, MARK_TOBEMOVED, MARK_TOBEREPLACED
 
Constructor Summary
PSPMotif(java.lang.String name, java.lang.String description)
          Creates a motif with name and description, but empty position specific probabilty matrix.
 
Method Summary
 java.lang.Object clone()
           Clones the motif by creating the same motif again and copying all importent fields.
 int compareTo(java.lang.Object o)
           Compares this motif with the specified object.
static PSPMotif createPSPMFromMotifBlock(java.lang.String[] lines, int start, int end, java.lang.String filter)
           Reads and evaluates a block of sequences given as array of strings.
static PSPMotif createPSPMotifFromSequenceMotif(RegularExpressionMotif sequMotif)
          Creates a new PSPMotif from a SequenceMotif or a RegularExpressionMotif, respectively.
 Motif createReversedComplement()
          Creates the reversed complementary motif.
 java.lang.String getConsensusSequence()
          For every position the character with the maximum probability or score, respectively, is taken.
 java.lang.String[] getGeneratingSequences()
           
 double getMaximum()
          The maximal possible probability is calculated by multiplying over all probs returned by this.getMaximum(int).
 double getMaximum(int pos)
          Calculates the maximal possible probability or score, respectively, at the given position.
 double getMinimum()
          The minimal possible probability is calculated by multiplying over all probs returned by this.getMaximum(int).
 double getMinimum(int pos)
          Calculates the minimal possible probability or score, respectively, at the given position.
 double[][] getMotif()
          Returns the motif as a matrix of double values.
 int[] getMotifPositions()
           
 java.lang.String getRegularExpression()
          Generates the regular expression represented by the motif.
 double getThreshold()
           
 int length()
          The length of the motif is to return.
protected  double match(char sequ, double[] col)
          Matches a char with a position specific probability or score, respectively.
protected  double matchComplement(char sequ, double[] col)
          Matches a char with the position specific probability, the char is treated complementary and then the normal match method is called.
 void print(java.io.PrintStream out)
          Prints the motif to the given stream, e.g.
static PSPMotif readMotifBlockFromFile(java.lang.String filename, int skip, java.lang.String filter)
           Reads a motif block from a file.
static PSPMotif readPSBlockFromFile(java.lang.String filename, int skip)
          Reads a block with position specific data from a file.
 FoundMotifStruct search(int seqStart, int seqLength)
          Runs through the sequence already given in the initialization method initSearch and matches the motif on this CharSequence with respect to the search mode and the search window determined by the two parameters seqStart and seqWidth.
 int setMotif(int[] position, double[][] weight)
           Sets the motif from the position and weight array.
 void setThreshold(double threshold)
          Sets the threshold.
static double[][] testWeigthMatrix(double[][] weight, boolean change)
           The matrix is tested for some constraints.
 
Methods inherited from class de.unibi.techfak.jpredictor.motifs.Motif
clearMark, clearMark, equals, getDescription, getMark, getName, getWeight, isMarked, isMarked, print, scoreSequenceWindow, setDescription, setMark, setName, setWeight, toString
 
Methods inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
getSearchMode, initSearch, searchAll, setSearchMode
 
Methods inherited from class java.lang.Object
finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher
getSearchMode, initSearch, searchAll, setSearchMode
 

Field Detail

A

static final int A
Constant for easy access to the columns of this.motif. This value is 0.

See Also:
Constant Field Values

C

static final int C
Constant for easy access to the columns of this.motif. This value is 1.

See Also:
Constant Field Values

G

static final int G
Constant for easy access to the columns of this.motif. This value is 2.

See Also:
Constant Field Values

T

static final int T
Constant for easy access to the columns of this.motif. This value is 3.

See Also:
Constant Field Values

motif

protected double[][] motif
The motif stored is of format [rows][cols]. The number of rows determine the length of the motif and the number of cols determine the alphabet length. Here, the alphabet ACG[T,U] is taken and thus, the alphabet length is 4.


position

protected int[] position
The positional information about the motif. This information is only relevant for printing the motif.


threshold

protected double threshold
The threshold probability used to decide whether a motif is a match or not.


generatingSequences

protected java.lang.String[] generatingSequences
The sequences this motif was generated from.

Constructor Detail

PSPMotif

public PSPMotif(java.lang.String name,
                java.lang.String description)
Creates a motif with name and description, but empty position specific probabilty matrix. The motif (the matrix) itself is set via setMotif(int[], double[][]). Otherwise you can obtain such a class by using one of the static methods. The motif is marked not usable (this mark has nothing to do with being incomplete - through a missing PSPM - but tells the program to not use this motif for operations until otherwise stated).

Parameters:
name - The name (identifier) of the motif.
description - A short description of the motif.
See Also:
setMotif(int[], double[][]), readMotifBlockFromFile(String, int, String), readPSBlockFromFile(String, int), createPSPMFromMotifBlock(String[], int, int, String), createPSPMotifFromSequenceMotif(RegularExpressionMotif)
Method Detail

setThreshold

public final void setThreshold(double threshold)
Description copied from interface: MotifSearchWithThreshold
Sets the threshold.

Specified by:
setThreshold in interface MotifSearchWithThreshold
Parameters:
threshold - The new threshold.
See Also:
MotifSearcher.search(int, int)

getThreshold

public final double getThreshold()
Specified by:
getThreshold in interface MotifSearchWithThreshold
Returns:
The threshold. If none was previously defined Double.NaN is returned.
See Also:
MotifSearcher.search(int, int)

setMotif

public int setMotif(int[] position,
                    double[][] weight)

Sets the motif from the position and weight array. The position array is an array of integer which holds an numberous description of the motif, e.g. the PHO core motif is GCCAT, thus the G will be at position 1 whereas all letters befor the G will have negativ position values. The position array can be null, in this case the positions are started with 1.

The weight matrix holds either probabilities or scores. The matrix must consist of at least 4 columns, one for each letter A, C, G and T, in that order. The number of rows decides about the length of the motif. The matrix has to be of format [pos][nuc]

At first, this method calls testWeigthMatrix and continues only, if the return value of this method is not null. Note, that both array and matrix are cloned only in case they need fitting. If position is larger than weight , it is cutted; if it is less in size it is blown up.

Parameters:
position - An array of row positions.
weight - The matrix of probabilities or scores or counts.
Returns:
Zero in case, that the motif was set, a number greater zero otherwise.
See Also:
testWeigthMatrix(double[][], boolean), PSSMotif.testWeigthMatrix(double[][], boolean)

getMotif

public double[][] getMotif()
Returns the motif as a matrix of double values. The matrix contains probabilities (PSPM) or scores (PSSM). The matrix is of format [pos][nuc]. Note, that the matrix should not be changed.

Returns:
The motif as a matrix, null if no motif was set yet.

getMotifPositions

public int[] getMotifPositions()
Returns:
The motif's positions or null if no motif was set yet.

getGeneratingSequences

public java.lang.String[] getGeneratingSequences()
Returns:
Returns the valid sequences, this PSPM/PSSM motif was built of.

testWeigthMatrix

public static double[][] testWeigthMatrix(double[][] weight,
                                          boolean change)

The matrix is tested for some constraints. The constraints are, first, no entry is less than zero, second, the sum in every row is one, and third, there is no row with sum 0.

If weight is null, or if one array (one column) is smaller than four entries, null is returned. The constraints are tested with the first four columns of the matrix. The matrix has to be of format [position][nucleotide]
If change is set to true and if the given matrix needs fitting, a new matrix is created and returned. If change is false the given matrix is only tested and in case of any flaw null is returned.

Parameters:
weight - The matrix to test and, if need and permission, copied from.
change - Sets, whether the method is permitted to create a new matrix to make the old one fit the constraints.
Returns:
The matrix itself or a new one, if the constraints are fulfilled, otherwise null.

getConsensusSequence

public java.lang.String getConsensusSequence()
For every position the character with the maximum probability or score, respectively, is taken. If two position specific values are equal the lesser characters is taken (where less means that 'A'<'C'<'G'<'T').

Overrides:
getConsensusSequence in class Motif
Returns:
The most probable sequence.

getRegularExpression

public java.lang.String getRegularExpression()
Generates the regular expression represented by the motif.

Overrides:
getRegularExpression in class Motif
Returns:
The motif as a regular expression using the degenerated DNA code.

getMaximum

public double getMaximum(int pos)
Calculates the maximal possible probability or score, respectively, at the given position.

Returns:
The maximal score, Double.NaN in case of error.

getMinimum

public double getMinimum(int pos)
Calculates the minimal possible probability or score, respectively, at the given position.

Returns:
The minimal score, Double.NaN in case of error.

getMaximum

public double getMaximum()
The maximal possible probability is calculated by multiplying over all probs returned by this.getMaximum(int).

Returns:
The maximal possible probability or Double.NaN in case of error
See Also:
getMaximum(int)

getMinimum

public double getMinimum()
The minimal possible probability is calculated by multiplying over all probs returned by this.getMaximum(int).

Returns:
The minimal possible probability or Double.NaN in case of error
See Also:
getMinimum(int)

length

public int length()
Description copied from class: MotifSearchAdapter
The length of the motif is to return.

Specified by:
length in class MotifSearchAdapter
Returns:
The length of the motif; if no motif is present, -1 is returned

print

public void print(java.io.PrintStream out)
Description copied from class: Motif
Prints the motif to the given stream, e.g. Motif.print( new PrintStream( new FileOutputStream ( name )))

Specified by:
print in class Motif
Parameters:
out - The stream to print the motif to.

search

public FoundMotifStruct search(int seqStart,
                               int seqLength)
                        throws MissingCharSequenceException,
                               MissingMotifException
Description copied from interface: MotifSearcher
Runs through the sequence already given in the initialization method initSearch and matches the motif on this CharSequence with respect to the search mode and the search window determined by the two parameters seqStart and seqWidth.
The first occurance of the motif on the sequence is returned as FoundMotifStruct with positions absolute to the sequence window. Note, that this method verifies no previous searches, instead it searches always new. If the motif was not found, null is returned.
The id of the returned struct is misused as a flag determining the search direction (search mode) the found motif was (any combination of SEARCH_DIRECTION_PLUS and SEARCH_DIRECTION_MINUS). Note, that it is possible, that a motif can occur more than once at the same position, e.g. the regular expression motif CNGCCATNDNND and its reversed complemented part HNNHNATGGCTG can both be matched on the sequence CGGCCATGGCTG. In case both search direction occur, the motif start and end position is always for plus direction. For a reversed complemented motif end is less than start.
If an error occured reading the sequence or if this method tried to read further than the sequence's size or if the motif could not be found on the sequence, null is returned.

Specified by:
search in interface MotifSearcher
Parameters:
seqStart - Search starts with this index.
seqLength - The width of the subsequence to search on.
Returns:
The first occurance of the motif on the subsequence or null if the motif could not be found on the sequence.
Throws:
MissingCharSequenceException - If no sequence to search on was set.
MissingMotifException - If no motif to search for was set.
See Also:
MotifSearcher.initSearch(CharSequence), MotifSearcher.SEARCH_DIRECTION_PLUS, MotifSearcher.SEARCH_DIRECTION_MINUS

match

protected double match(char sequ,
                       double[] col)
Matches a char with a position specific probability or score, respectively. Note, that U is treated as T.

Parameters:
sequ - The character to get found in pos.
col - Position specific values.
Returns:
The probability or score, respectively, for that character or Double.NaN, if the char was not found.

matchComplement

protected double matchComplement(char sequ,
                                 double[] col)
Matches a char with the position specific probability, the char is treated complementary and then the normal match method is called.

Parameters:
sequ - The character, whose reversed complementary part is searched.
col - Position specific values.
Returns:
The probability or score, respectively, for that character or Double.NaN, if the char was not found.
See Also:
match(char, double[])

createReversedComplement

public Motif createReversedComplement()
Description copied from class: Motif
Creates the reversed complementary motif.

Specified by:
createReversedComplement in class Motif
Returns:
The reversed and complemented motif.

readMotifBlockFromFile

public static PSPMotif readMotifBlockFromFile(java.lang.String filename,
                                              int skip,
                                              java.lang.String filter)

Reads a motif block from a file. Each motif block must start with a leading '>' character (at the beginning of the line, fasta-style) and must end with either the next '>' character or the end of file. The sequences in the block are evaluated to yield a PSPM motif.

Using the parameter skip one can skip the first motif blocks and evaluate only the (skip+1)st block. The length of the resulting motif is determined by the shortest sequence of the block. The sequences in the block are evaluated using the method createPSPMFromMotifBlock(String[], int, int, String). The filter string is passed to that method.

Comments are allowed in the lines, the delimiter characters are either '#', ';' or '/', respectively.

Parameters:
filename - The name of the file to read the block from.
skip - The number of blocks to skip before the one to evaluate.
filter - The filter on the sequence strings in the block.
Returns:
A newly created PSPMotif, null in case of error.
See Also:
createPSPMFromMotifBlock(String[], int, int, String)

createPSPMFromMotifBlock

public static PSPMotif createPSPMFromMotifBlock(java.lang.String[] lines,
                                                int start,
                                                int end,
                                                java.lang.String filter)

Reads and evaluates a block of sequences given as array of strings. Every String in the array represents a short sequence and all sequences are combined to a PSPM motif.

Use one of the filter strings (see motifs.MotifFilter) to filter all sequences to only valid characters. Note, that the length of the resulting motif is determined by the shortest sequence in the block after filtering. The sequences can contain regular expression characters.

Comments are not allowed in the lines. All lines are uppercased before filtering. The default filter String, which is used if you set filter = null, is MotifFilter.DNA_RNA_FILTER_DEGENERATED. Empty (after filtering) or null-Strings in the array are ignored.

Example: imagine the following array of sequences:

 CGAGYG
 TGAACG
 T G A C C G
 T G A T T N
 
The PSPM returned is printed out as follows:
 pos            A               C               G               T
 1               0.0     0.25    0.0     0.75
 2               0.0     0.0     1.0     0.0
 3               1.0     0.0     0.0     0.0
 4               0.25    0.25    0.25    0.25
 5               0.0     0.625   0.0     0.375
 6               0.0625  0.0625  0.8125  0.0625
 

Note, that characters from the degenerated DNA alphabet (regular expression characters) are handled as follows: an equally distributed probability is added to the char counter, e.g. a G counts one, but a Y counts 0.5 to C and 0.5 to T.

Parameters:
lines - The lines containing a whole file or the motif block.
start - The start of the block within the array.
end - The end of the block within the array. Set it to -1, if all lines to the end should get evaluated.
filter - Every sequence in the block is filtered by calling MotifFilter.filterString( lines[i], filter ). If it is null MotifFilter.DNA_RNA_FILTER_DEGENERATED is taken.
Returns:
A newly created PSPMotif without name and description, but with a set motif and generating sequences, null in case of error.
See Also:
MotifFilter, MotifFilter.DNA_RNA_FILTER_DEGENERATED

readPSBlockFromFile

public static PSPMotif readPSBlockFromFile(java.lang.String filename,
                                           int skip)
Reads a block with position specific data from a file. The data format is as follows. Each block must start with a leading '>' character (at the beginning of the line, fasta-style) and must end with either the next '>' character or the end of file. The block contains of lines of data, each line consists of 5 values, the delimiter is either \ or ';' (semicolon). The first value is the position number, the next 4 values represent the base occurances or the probability. Example:
 >PSBlock
 #pos   A       C       G       T
 -1     10      9       0       0               / any comment
 1      0       0       0       19
 
The characters '#' and '/' start comments, that are ignored. All lines are evaluated and stored in a PSPMotif, which is returned. Using the parameter 'skip' you can skip the first motif blocks and evaluate only the (skip+1)st block, e.g. set skip to zero to evaluate the first block, that can be found.
Note: the length of the resulting motif is determined by the number of valid lines found. Invalid lines are ignored. Note: the lines are not ordered with respect to the position. Note: The output generated by the print()-method of both classes PSPMotif and PSSMotif can be read by this method. Use the calls:
                System.out.println( ">"+ motif.getName() );
                motif.print();
 

Parameters:
filename - The name of the file to read the block from.
skip - The number of blocks to skip before the one to evaluate.
Returns:
A newly created PSPMotif or null in case of error.

createPSPMotifFromSequenceMotif

public static PSPMotif createPSPMotifFromSequenceMotif(RegularExpressionMotif sequMotif)
Creates a new PSPMotif from a SequenceMotif or a RegularExpressionMotif, respectively. The probabilities in every position are equally distributed between the bases. The threshold is set to the highest possible probability.

Parameters:
sequMotif - The motif to translate.
Returns:
A newly created PSPMotif, null in case of error.

clone

public java.lang.Object clone()
Description copied from class: Motif

Clones the motif by creating the same motif again and copying all importent fields. The result is at least of class Motif. When cloning a MultiMotif the referred single motifs are not cloned.

The fields that are copied are: Name, Description, the Motif itself, errorNumberAllowedForMatch or Threshold, respectively, Weight and searchMode (altogether 6 fields by now). What is not copied are the fields of the MotifSearchAdapter.

Note, that any matrices are not cloned. Thus, if cloning a PSPMotif or a PSSMotif the cloned motif does contain a reference to the matrix of the old motif. This was done to preserve memory.

Specified by:
clone in class Motif
Returns:
The Motif itself as a doubled copy.
See Also:
Object.clone()

compareTo

public int compareTo(java.lang.Object o)
Description copied from class: Motif

Compares this motif with the specified object. The Object given must be a Motif and must not be null, otherwise a ClassCastException is thrown. If for this motif or the given one, the motif representation was not set, a ClassCastException is thrown as well.

Two motifs are compared for their representations, both plus and minus depending on the search direction. Both motifs must be of the same type, otherwise the following relation is considered: SequenceMotif < PSPMotif < MultiMotif and returned without further investigations. Note, that RegularExpressionMotif s count as SequenceMotifs and that PSSMotif s count as PSPMotifs. In case of comparing two MultiMotifs, after comparing the comprised single motifs the distance informations are taken into account: first minimum vs. first minimum, then first maximum, second minimum, and so on. In the last instance two sequence motifs are compared for their error number allowed for match, whereas two position specific matrix motifs are compared for their threshold.

More formaly, this motif is less than the specified one if this reg-exp string is less than the one from the specified motif.

Specified by:
compareTo in interface java.lang.Comparable
Specified by:
compareTo in class Motif
Parameters:
o - The motif this method is compared to.
Returns:
Minus one, zero or one as this motif is less than, equal or greater than the specified motif.