|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectde.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
de.unibi.techfak.jpredictor.motifs.Motif
de.unibi.techfak.jpredictor.motifs.PSPMotif
public class PSPMotif
Contains a motif represented by probabilities for every base in every position (PSPM = position specific probability matrix). When this motif is searched the threshold is taken into account after multiplying the probabilities for all positions. If the threshold is less then the product, the motif was found.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter |
|---|
MotifSearchAdapter.SearchFields |
| Field Summary | |
|---|---|
(package private) static int |
A
Constant for easy access to the columns of this.motif. |
(package private) static int |
C
Constant for easy access to the columns of this.motif. |
(package private) static int |
G
Constant for easy access to the columns of this.motif. |
protected java.lang.String[] |
generatingSequences
The sequences this motif was generated from. |
protected double[][] |
motif
The motif stored is of format [rows][cols]. |
protected int[] |
position
The positional information about the motif. |
(package private) static int |
T
Constant for easy access to the columns of this.motif. |
protected double |
threshold
The threshold probability used to decide whether a motif is a match or not. |
| Fields inherited from class de.unibi.techfak.jpredictor.motifs.Motif |
|---|
DNA_DEGENERATED_CODE, DNA_DEGENERATED_CODE_JOINING, MARK_USABLE |
| Fields inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter |
|---|
searchFields |
| Fields inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher |
|---|
SEARCH_ALL_ORIENTATIONS, SEARCH_DIRECTION_MAX, SEARCH_DIRECTION_MINUS, SEARCH_DIRECTION_PLUS |
| Fields inherited from interface de.unibi.techfak.misc.Markable |
|---|
MARK_BASIC, MARK_DELETED, MARK_EXP, MARK_MOVED, MARK_REPLACED, MARK_SELECTED, MARK_TOBEDELETED, MARK_TOBEMOVED, MARK_TOBEREPLACED |
| Constructor Summary | |
|---|---|
PSPMotif(java.lang.String name,
java.lang.String description)
Creates a motif with name and description, but empty position specific probabilty matrix. |
|
| Method Summary | |
|---|---|
java.lang.Object |
clone()
Clones the motif by creating the same motif again and copying all importent fields. |
int |
compareTo(java.lang.Object o)
Compares this motif with the specified object. |
static PSPMotif |
createPSPMFromMotifBlock(java.lang.String[] lines,
int start,
int end,
java.lang.String filter)
Reads and evaluates a block of sequences given as array of strings. |
static PSPMotif |
createPSPMotifFromSequenceMotif(RegularExpressionMotif sequMotif)
Creates a new PSPMotif from a SequenceMotif
or a RegularExpressionMotif, respectively. |
Motif |
createReversedComplement()
Creates the reversed complementary motif. |
java.lang.String |
getConsensusSequence()
For every position the character with the maximum probability or score, respectively, is taken. |
java.lang.String[] |
getGeneratingSequences()
|
double |
getMaximum()
The maximal possible probability is calculated by multiplying over all probs returned by this.getMaximum(int). |
double |
getMaximum(int pos)
Calculates the maximal possible probability or score, respectively, at the given position. |
double |
getMinimum()
The minimal possible probability is calculated by multiplying over all probs returned by this.getMaximum(int). |
double |
getMinimum(int pos)
Calculates the minimal possible probability or score, respectively, at the given position. |
double[][] |
getMotif()
Returns the motif as a matrix of double values. |
int[] |
getMotifPositions()
|
java.lang.String |
getRegularExpression()
Generates the regular expression represented by the motif. |
double |
getThreshold()
|
int |
length()
The length of the motif is to return. |
protected double |
match(char sequ,
double[] col)
Matches a char with a position specific probability or score, respectively. |
protected double |
matchComplement(char sequ,
double[] col)
Matches a char with the position specific probability, the char is treated complementary and then the normal match method is called. |
void |
print(java.io.PrintStream out)
Prints the motif to the given stream, e.g. |
static PSPMotif |
readMotifBlockFromFile(java.lang.String filename,
int skip,
java.lang.String filter)
Reads a motif block from a file. |
static PSPMotif |
readPSBlockFromFile(java.lang.String filename,
int skip)
Reads a block with position specific data from a file. |
FoundMotifStruct |
search(int seqStart,
int seqLength)
Runs through the sequence already given in the initialization method initSearch and matches the motif on this
CharSequence with respect to the search mode and the
search window determined by the two parameters seqStart
and seqWidth. |
int |
setMotif(int[] position,
double[][] weight)
Sets the motif from the position and weight array. |
void |
setThreshold(double threshold)
Sets the threshold. |
static double[][] |
testWeigthMatrix(double[][] weight,
boolean change)
The matrix is tested for some constraints. |
| Methods inherited from class de.unibi.techfak.jpredictor.motifs.Motif |
|---|
clearMark, clearMark, equals, getDescription, getMark, getName, getWeight, isMarked, isMarked, print, scoreSequenceWindow, setDescription, setMark, setName, setWeight, toString |
| Methods inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter |
|---|
getSearchMode, initSearch, searchAll, setSearchMode |
| Methods inherited from class java.lang.Object |
|---|
finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher |
|---|
getSearchMode, initSearch, searchAll, setSearchMode |
| Field Detail |
|---|
static final int A
this.motif.
This value is 0.
static final int C
this.motif.
This value is 1.
static final int G
this.motif.
This value is 2.
static final int T
this.motif.
This value is 3.
protected double[][] motif
protected int[] position
protected double threshold
protected java.lang.String[] generatingSequences
| Constructor Detail |
|---|
public PSPMotif(java.lang.String name,
java.lang.String description)
setMotif(int[], double[][]).
Otherwise you can obtain such a class by using one of the static
methods. The motif is marked not usable (this mark has nothing
to do with being incomplete - through a missing PSPM - but tells
the program to not use this motif for operations until otherwise
stated).
name - The name (identifier) of the motif.description - A short description of the motif.setMotif(int[], double[][]),
readMotifBlockFromFile(String, int, String),
readPSBlockFromFile(String, int),
createPSPMFromMotifBlock(String[], int, int, String),
createPSPMotifFromSequenceMotif(RegularExpressionMotif)| Method Detail |
|---|
public final void setThreshold(double threshold)
MotifSearchWithThreshold
setThreshold in interface MotifSearchWithThresholdthreshold - The new threshold.MotifSearcher.search(int, int)public final double getThreshold()
getThreshold in interface MotifSearchWithThresholdDouble.NaN is returned.MotifSearcher.search(int, int)
public int setMotif(int[] position,
double[][] weight)
Sets the motif from the position and weight array. The position
array is an array of integer which holds an numberous description
of the motif, e.g. the PHO core motif is GCCAT, thus the G will be
at position 1 whereas all letters befor the G will have negativ
position values. The position array can be null,
in this case the positions are started with 1.
The weight matrix holds either probabilities or scores. The matrix must consist of at least 4 columns, one for each letter A, C, G and T, in that order. The number of rows decides about the length of the motif. The matrix has to be of format [pos][nuc]
At first, this method calls testWeigthMatrix and continues
only, if the return value of this method is not null.
Note, that both array and matrix are cloned only in case they need
fitting. If position is larger than weight
, it is cutted; if it is less in size it is blown up.
position - An array of row positions.weight - The matrix of probabilities or scores or counts.
testWeigthMatrix(double[][], boolean),
PSSMotif.testWeigthMatrix(double[][], boolean)public double[][] getMotif()
null if no motif
was set yet.public int[] getMotifPositions()
null if no motif
was set yet.public java.lang.String[] getGeneratingSequences()
public static double[][] testWeigthMatrix(double[][] weight,
boolean change)
The matrix is tested for some constraints. The constraints are, first, no entry is less than zero, second, the sum in every row is one, and third, there is no row with sum 0.
If weight is null, or if one array (one
column) is smaller than four entries, null is returned.
The constraints are tested with the first four columns of the matrix.
The matrix has to be of format [position][nucleotide]
If change is set to true and if the given
matrix needs fitting, a new matrix is created and returned. If
change is false the given matrix is only tested
and in case of any flaw null is returned.
weight - The matrix to test and, if need and permission, copied
from.change - Sets, whether the method is permitted to create a
new matrix to make the old one fit the constraints.
null.public java.lang.String getConsensusSequence()
getConsensusSequence in class Motifpublic java.lang.String getRegularExpression()
getRegularExpression in class Motifpublic double getMaximum(int pos)
Double.NaN in case of error.public double getMinimum(int pos)
Double.NaN in case of error.public double getMaximum()
this.getMaximum(int).
Double.NaN
in case of errorgetMaximum(int)public double getMinimum()
this.getMaximum(int).
Double.NaN
in case of errorgetMinimum(int)public int length()
MotifSearchAdapter
length in class MotifSearchAdapterpublic void print(java.io.PrintStream out)
MotifMotif.print(
new PrintStream( new FileOutputStream ( name )))
print in class Motifout - The stream to print the motif to.
public FoundMotifStruct search(int seqStart,
int seqLength)
throws MissingCharSequenceException,
MissingMotifException
MotifSearcherinitSearch and matches the motif on this
CharSequence with respect to the search mode and the
search window determined by the two parameters seqStart
and seqWidth.
FoundMotifStruct with positions absolute to the
sequence window. Note, that this method verifies no previous
searches, instead it searches always new. If the motif was not
found, null is returned.
id of the returned struct is misused as a flag
determining the search direction (search mode) the found motif was
(any combination of SEARCH_DIRECTION_PLUS and
SEARCH_DIRECTION_MINUS). Note, that it is possible,
that a motif can occur more than once at the same position, e.g. the
regular expression motif CNGCCATNDNND and its reversed complemented
part HNNHNATGGCTG can both be matched on the sequence CGGCCATGGCTG.
In case both search direction occur, the motif start and end position
is always for plus direction. For a reversed complemented motif
end is less than start.
null is returned.
search in interface MotifSearcherseqStart - Search starts with this index.seqLength - The width of the subsequence to search on.
null if the motif could not be found on the sequence.
MissingCharSequenceException - If no sequence to search on was
set.
MissingMotifException - If no motif to search for was set.MotifSearcher.initSearch(CharSequence),
MotifSearcher.SEARCH_DIRECTION_PLUS,
MotifSearcher.SEARCH_DIRECTION_MINUS
protected double match(char sequ,
double[] col)
sequ - The character to get found in pos.col - Position specific values.
Double.NaN, if the char was not found.
protected double matchComplement(char sequ,
double[] col)
sequ - The character, whose reversed complementary part is
searched.col - Position specific values.
Double.NaN, if the char was not found.match(char, double[])public Motif createReversedComplement()
Motif
createReversedComplement in class Motif
public static PSPMotif readMotifBlockFromFile(java.lang.String filename,
int skip,
java.lang.String filter)
Reads a motif block from a file. Each motif block must start with a leading '>' character (at the beginning of the line, fasta-style) and must end with either the next '>' character or the end of file. The sequences in the block are evaluated to yield a PSPM motif.
Using the parameter skip one can skip the first motif
blocks and evaluate only the (skip+1)st block. The length
of the resulting motif is determined by the shortest sequence of the
block. The sequences in the block are evaluated using the method
createPSPMFromMotifBlock(String[], int, int, String).
The filter string is passed to that method.
Comments are allowed in the lines, the delimiter characters are either '#', ';' or '/', respectively.
filename - The name of the file to read the block from.skip - The number of blocks to skip before the one to evaluate.filter - The filter on the sequence strings in the block.
PSPMotif, null
in case of error.createPSPMFromMotifBlock(String[], int, int, String)
public static PSPMotif createPSPMFromMotifBlock(java.lang.String[] lines,
int start,
int end,
java.lang.String filter)
Reads and evaluates a block of sequences given as array of strings.
Every String in the array represents a short sequence
and all sequences are combined to a PSPM motif.
Use one of the filter strings (see motifs.MotifFilter)
to filter all sequences to only valid characters. Note, that the
length of the resulting motif is determined by the shortest sequence
in the block after filtering. The sequences can contain regular
expression characters.
Comments are not allowed in the lines. All lines are uppercased
before filtering. The default filter String, which is
used if you set filter = null, is
MotifFilter.DNA_RNA_FILTER_DEGENERATED. Empty (after
filtering) or null-Strings in the array
are ignored.
Example: imagine the following array of sequences:
CGAGYG TGAACG T G A C C G T G A T T NThe PSPM returned is printed out as follows:
pos A C G T 1 0.0 0.25 0.0 0.75 2 0.0 0.0 1.0 0.0 3 1.0 0.0 0.0 0.0 4 0.25 0.25 0.25 0.25 5 0.0 0.625 0.0 0.375 6 0.0625 0.0625 0.8125 0.0625
Note, that characters from the degenerated DNA alphabet (regular expression characters) are handled as follows: an equally distributed probability is added to the char counter, e.g. a G counts one, but a Y counts 0.5 to C and 0.5 to T.
lines - The lines containing a whole file or the motif block.start - The start of the block within the array.end - The end of the block within the array. Set it to -1, if
all lines to the end should get evaluated.filter - Every sequence in the block is filtered by calling
MotifFilter.filterString( lines[i], filter ). If it is
null MotifFilter.DNA_RNA_FILTER_DEGENERATED
is taken.
PSPMotif without name and
description, but with a set motif and generating sequences,
null in case of error.MotifFilter,
MotifFilter.DNA_RNA_FILTER_DEGENERATED
public static PSPMotif readPSBlockFromFile(java.lang.String filename,
int skip)
>PSBlock #pos A C G T -1 10 9 0 0 / any comment 1 0 0 0 19The characters '#' and '/' start comments, that are ignored. All lines are evaluated and stored in a PSPMotif, which is returned. Using the parameter 'skip' you can skip the first motif blocks and evaluate only the (skip+1)st block, e.g. set
skip
to zero to evaluate the first block, that can be found.
System.out.println( ">"+ motif.getName() );
motif.print();
filename - The name of the file to read the block from.skip - The number of blocks to skip before the one to evaluate.
PSPMotif or null
in case of error.public static PSPMotif createPSPMotifFromSequenceMotif(RegularExpressionMotif sequMotif)
PSPMotif from a SequenceMotif
or a RegularExpressionMotif, respectively. The
probabilities in every position are equally distributed between the
bases. The threshold is set to the highest possible probability.
sequMotif - The motif to translate.
PSPMotif, null
in case of error.public java.lang.Object clone()
Motif
Clones the motif by creating the same motif again and copying all
importent fields. The result is at least of class Motif.
When cloning a MultiMotif the referred single motifs
are not cloned.
The fields that are copied are: Name, Description, the Motif itself,
errorNumberAllowedForMatch or Threshold, respectively, Weight and
searchMode (altogether 6 fields by now). What is not copied are the
fields of the MotifSearchAdapter.
Note, that any matrices are not cloned. Thus, if cloning a
PSPMotif or a PSSMotif the cloned motif does
contain a reference to the matrix of the old motif. This was done
to preserve memory.
clone in class MotifObject.clone()public int compareTo(java.lang.Object o)
Motif
Compares this motif with the specified object. The Object
given must be a Motif and must not be null,
otherwise a ClassCastException is thrown. If for this
motif or the given one, the motif representation was not set, a
ClassCastException is thrown as well.
Two motifs are compared for their representations, both plus and
minus depending on the search direction. Both motifs must be of the
same type, otherwise the following relation is considered:
SequenceMotif < PSPMotif < MultiMotif and returned
without further investigations. Note, that RegularExpressionMotif
s count as SequenceMotifs and that PSSMotif
s count as PSPMotifs. In case of comparing two
MultiMotifs, after comparing the comprised single motifs
the distance informations are taken into account: first minimum vs.
first minimum, then first maximum, second minimum, and so on. In the
last instance two sequence motifs are compared for their error number
allowed for match, whereas two position specific matrix motifs are
compared for their threshold.
More formaly, this motif is less than the specified one if this reg-exp string is less than the one from the specified motif.
compareTo in interface java.lang.ComparablecompareTo in class Motifo - The motif this method is compared to.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||