de.unibi.techfak.jpredictor.motifs
Class RegularExpressionMotif

java.lang.Object
  extended by de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
      extended by de.unibi.techfak.jpredictor.motifs.Motif
          extended by de.unibi.techfak.jpredictor.motifs.RegularExpressionMotif
All Implemented Interfaces:
MotifSearcher, MotifSearchWithError, SequenceWindowScorer, Markable, java.lang.Cloneable, java.lang.Comparable
Direct Known Subclasses:
SequenceMotif

public class RegularExpressionMotif
extends Motif
implements MotifSearchWithError

Contains a regular expression sequence as motif, that is given as a string e.g. "YGAGYG" with 'Y' standing for C,T. The IUB-code, that is the degenerated DNA-code, can be used for the characters.


Nested Class Summary
 
Nested classes/interfaces inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
MotifSearchAdapter.SearchFields
 
Field Summary
private  int errorNumberAllowedForMatch
          The number of mismatch errors allowed in sequence alignment.
protected  java.lang.String motif
          The motif stored as String.
 
Fields inherited from class de.unibi.techfak.jpredictor.motifs.Motif
DNA_DEGENERATED_CODE, DNA_DEGENERATED_CODE_JOINING, MARK_USABLE
 
Fields inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
searchFields
 
Fields inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher
SEARCH_ALL_ORIENTATIONS, SEARCH_DIRECTION_MAX, SEARCH_DIRECTION_MINUS, SEARCH_DIRECTION_PLUS
 
Fields inherited from interface de.unibi.techfak.misc.Markable
MARK_BASIC, MARK_DELETED, MARK_EXP, MARK_MOVED, MARK_REPLACED, MARK_SELECTED, MARK_TOBEDELETED, MARK_TOBEMOVED, MARK_TOBEREPLACED
 
Constructor Summary
RegularExpressionMotif()
          Standard constructor, inits all values to zero or null.
RegularExpressionMotif(java.lang.String name, java.lang.String description, java.lang.String motif, int nErrorNumberAllowedForMatch)
          Sets all the motif's characteristics.
 
Method Summary
 java.lang.Object clone()
           Clones the motif by creating the same motif again and copying all importent fields.
 int compareTo(java.lang.Object o)
           Compares this motif with the specified object.
static java.lang.String createConsensusFromMotifBlock(java.lang.String[] lines, int start, int end, java.lang.String filter)
          Reads a motif block, where every line in the String array represents a short sequence and all sequences are combined to a regular expression sequence.
 Motif createReversedComplement()
          Creates the reversed complementary motif.
 java.lang.String getConsensusSequence()
          Calculates the consensus sequence from the motif, replaces the degenerated letter-representations of DNA with the base letters in the order A,C,G,T.
 int getErrorNumberAllowedForMatch()
           
 java.lang.String getRegularExpression()
          Represents the motif as a regular expression.
static char getRegularExpressionChar(char r, char b)
          Joins two chars into one Regular Expression, e.g.
 int length()
          The length of the motif is to return.
protected  boolean match(char sequ, char reg)
          Matches two chars.
protected  boolean matchComplement(char sequ, char reg)
          Matches two chars, the first is treated complementary and then the normal match method is called.
 void print(java.io.PrintStream out)
          Prints the motif to the given stream, e.g.
static RegularExpressionMotif readMotifBlockFromFile(java.lang.String filename, int bnr, java.lang.String filter)
          Reads a motif block from a file.
 FoundMotifStruct search(int seqStart, int seqLength)
          Runs through the sequence already given in the initialization method initSearch and matches the motif on this CharSequence with respect to the search mode and the search window determined by the two parameters seqStart and seqWidth.
 void setErrorNumberAllowedForMatch(int errorNumber)
          Sets the number of mismatch errors allowed in one single alignment of motif against another sequence
 void setMotif(java.lang.String motif)
          Sets the motif from a given String.
 
Methods inherited from class de.unibi.techfak.jpredictor.motifs.Motif
clearMark, clearMark, equals, getDescription, getMark, getName, getWeight, isMarked, isMarked, print, scoreSequenceWindow, setDescription, setMark, setName, setWeight, toString
 
Methods inherited from class de.unibi.techfak.jpredictor.motifs.MotifSearchAdapter
getSearchMode, initSearch, searchAll, setSearchMode
 
Methods inherited from class java.lang.Object
finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface de.unibi.techfak.jpredictor.motifs.MotifSearcher
getSearchMode, initSearch, searchAll, setSearchMode
 

Field Detail

motif

protected java.lang.String motif
The motif stored as String.


errorNumberAllowedForMatch

private int errorNumberAllowedForMatch
The number of mismatch errors allowed in sequence alignment.

Constructor Detail

RegularExpressionMotif

public RegularExpressionMotif(java.lang.String name,
                              java.lang.String description,
                              java.lang.String motif,
                              int nErrorNumberAllowedForMatch)
Sets all the motif's characteristics.

Parameters:
name - The name of the motif.
description - A short description of the motif.
motif - The motif itself as a short sequence.
nErrorNumberAllowedForMatch - The number of mismatch errors allowed for a perfect match.

RegularExpressionMotif

public RegularExpressionMotif()
Standard constructor, inits all values to zero or null.

Method Detail

setMotif

public void setMotif(java.lang.String motif)
Sets the motif from a given String. The motif is filtered with MotifFilter.DNA_RNA_FILTER_DEGENERATED.

Parameters:
motif - The new motif as a short sequence.
See Also:
MotifFilter.DNA_RNA_FILTER_DEGENERATED

getConsensusSequence

public java.lang.String getConsensusSequence()
Calculates the consensus sequence from the motif, replaces the degenerated letter-representations of DNA with the base letters in the order A,C,G,T. So a Y is always replaced with a C, and an N always with an A.

Overrides:
getConsensusSequence in class Motif
Returns:
The consensus sequence as String or null if no motif is present.

getRegularExpression

public java.lang.String getRegularExpression()
Description copied from class: Motif
Represents the motif as a regular expression. The default is to return the same as getConsensusSequence().

Overrides:
getRegularExpression in class Motif
Returns:
the motif as it was set.

length

public int length()
Description copied from class: MotifSearchAdapter
The length of the motif is to return.

Specified by:
length in class MotifSearchAdapter
Returns:
The length of the motif; if no motif is present, -1 is returned

print

public void print(java.io.PrintStream out)
Description copied from class: Motif
Prints the motif to the given stream, e.g. Motif.print( new PrintStream( new FileOutputStream ( name )))

Specified by:
print in class Motif
Parameters:
out - The stream to print the motif to.

search

public FoundMotifStruct search(int seqStart,
                               int seqLength)
                        throws MissingCharSequenceException,
                               MissingMotifException
Description copied from interface: MotifSearcher
Runs through the sequence already given in the initialization method initSearch and matches the motif on this CharSequence with respect to the search mode and the search window determined by the two parameters seqStart and seqWidth.
The first occurance of the motif on the sequence is returned as FoundMotifStruct with positions absolute to the sequence window. Note, that this method verifies no previous searches, instead it searches always new. If the motif was not found, null is returned.
The id of the returned struct is misused as a flag determining the search direction (search mode) the found motif was (any combination of SEARCH_DIRECTION_PLUS and SEARCH_DIRECTION_MINUS). Note, that it is possible, that a motif can occur more than once at the same position, e.g. the regular expression motif CNGCCATNDNND and its reversed complemented part HNNHNATGGCTG can both be matched on the sequence CGGCCATGGCTG. In case both search direction occur, the motif start and end position is always for plus direction. For a reversed complemented motif end is less than start.
If an error occured reading the sequence or if this method tried to read further than the sequence's size or if the motif could not be found on the sequence, null is returned.

Specified by:
search in interface MotifSearcher
Parameters:
seqStart - Search starts with this index.
seqLength - The width of the subsequence to search on.
Returns:
The first occurance of the motif on the subsequence or null if the motif could not be found on the sequence.
Throws:
MissingCharSequenceException - If no sequence to search on was set.
MissingMotifException - If no motif to search for was set.
See Also:
MotifSearcher.initSearch(CharSequence), MotifSearcher.SEARCH_DIRECTION_PLUS, MotifSearcher.SEARCH_DIRECTION_MINUS

createReversedComplement

public Motif createReversedComplement()
Description copied from class: Motif
Creates the reversed complementary motif.

Specified by:
createReversedComplement in class Motif
Returns:
The reversed and complemented motif.

match

protected boolean match(char sequ,
                        char reg)
Matches two chars. Both chars can be regular expressions (means degenerated DNA or RNA code), e.g. match of 'Y' and 'B' returns true, because both chars contain at least 'C'. The match of 'Y' and 'G' returns false, because 'C'|'T' does not contain a 'G'.

Parameters:
sequ - The first regular expression char
reg - The second regular expression char
Returns:
true iff both regular expression have at least one base in common, false otherwise

matchComplement

protected boolean matchComplement(char sequ,
                                  char reg)
Matches two chars, the first is treated complementary and then the normal match method is called.

Parameters:
sequ - The char from the sequence.
reg - The char from the motif.
Returns:
true iff the chars are equal after complementing the first one, false otherwise
See Also:
SequenceMotif.match(char, char)

readMotifBlockFromFile

public static RegularExpressionMotif readMotifBlockFromFile(java.lang.String filename,
                                                            int bnr,
                                                            java.lang.String filter)
Reads a motif block from a file. Each motif block must start with a leading '>' character (at the beginning of the line, fasta-style) and must end with either the next '>' character or the end of file. All motifs (short sequences) found in the block are evaluated to extract a RegularExpressionMotif, which is returned.
Using the parameter 'bnr' one can skip the first 'bnr' motif blocks and evaluate only the (bnr+1)st block. Use one of the filter strings (see MotifFilter) to filter all found sequences to only valid characters.
Note: the length of the resulting motif is determined by the shortest sequence of the block. Regular expression characters are allowed in the block. Example: imagine the following motif block:
 >MotifBlockName
 CGAGTG
 TGAACG
 TGACCG
 TGATTG
 
The function returns the regular expression 'YGANYG'.

Parameters:
filename - The name of the file to read from.
bnr - The block number within the file.
filter - The filter used when lines are evaluated.
Returns:
A newly created RegularExpressionMotif, null in case of error.
See Also:
MotifFilter.filterString(String, String)

createConsensusFromMotifBlock

public static java.lang.String createConsensusFromMotifBlock(java.lang.String[] lines,
                                                             int start,
                                                             int end,
                                                             java.lang.String filter)
Reads a motif block, where every line in the String array represents a short sequence and all sequences are combined to a regular expression sequence.
This method only calls PSPMotif.createPSPMFromMotifBlockPSPMotif. Example: the following block of motif sequences
 CGAGTG
 TGAACG
 TGACCG
 TGATTG
 
is represented by the regular expression 'YGANYG' and thus 'CGAACG' is returned..

Parameters:
lines - The lines containing a file or the motif block.
start - The start of the block within the array.
end - The end of the block within the array.
filter - Every sequence in the block is filtered by calling MotifFilter.filterString( lines[i], filter ).
Returns:
A consensus sequence as a String, null in case of error.
See Also:
PSPMotif.createPSPMFromMotifBlock(String[], int, int, String), MotifFilter.filterString(String, String)

getRegularExpressionChar

public static char getRegularExpressionChar(char r,
                                            char b)
Joins two chars into one Regular Expression, e.g. Y and A are joined to H. Both chars can be Regular Expressions, e.g. Y and B are joined to B. All chars have to be given as upper case chars.

Parameters:
r - The first of two chars.
b - The second character.
Returns:
The regular expression resulting from joining the two chars. Returns the other char in case of an unknown char, e.g. E.

clone

public java.lang.Object clone()
Description copied from class: Motif

Clones the motif by creating the same motif again and copying all importent fields. The result is at least of class Motif. When cloning a MultiMotif the referred single motifs are not cloned.

The fields that are copied are: Name, Description, the Motif itself, errorNumberAllowedForMatch or Threshold, respectively, Weight and searchMode (altogether 6 fields by now). What is not copied are the fields of the MotifSearchAdapter.

Note, that any matrices are not cloned. Thus, if cloning a PSPMotif or a PSSMotif the cloned motif does contain a reference to the matrix of the old motif. This was done to preserve memory.

Specified by:
clone in class Motif
Returns:
The Motif itself as a doubled copy.
See Also:
Object.clone()

getErrorNumberAllowedForMatch

public int getErrorNumberAllowedForMatch()
Specified by:
getErrorNumberAllowedForMatch in interface MotifSearchWithError
Returns:
The number of mismatch errors allowed in one single alignment of motif against sequence.
See Also:
MotifSearcher.search(int, int)

setErrorNumberAllowedForMatch

public void setErrorNumberAllowedForMatch(int errorNumber)
Description copied from interface: MotifSearchWithError
Sets the number of mismatch errors allowed in one single alignment of motif against another sequence

Specified by:
setErrorNumberAllowedForMatch in interface MotifSearchWithError
Parameters:
errorNumber - The number of mismatches allowed.
See Also:
MotifSearcher.search(int, int)

compareTo

public int compareTo(java.lang.Object o)
Description copied from class: Motif

Compares this motif with the specified object. The Object given must be a Motif and must not be null, otherwise a ClassCastException is thrown. If for this motif or the given one, the motif representation was not set, a ClassCastException is thrown as well.

Two motifs are compared for their representations, both plus and minus depending on the search direction. Both motifs must be of the same type, otherwise the following relation is considered: SequenceMotif < PSPMotif < MultiMotif and returned without further investigations. Note, that RegularExpressionMotif s count as SequenceMotifs and that PSSMotif s count as PSPMotifs. In case of comparing two MultiMotifs, after comparing the comprised single motifs the distance informations are taken into account: first minimum vs. first minimum, then first maximum, second minimum, and so on. In the last instance two sequence motifs are compared for their error number allowed for match, whereas two position specific matrix motifs are compared for their threshold.

More formaly, this motif is less than the specified one if this reg-exp string is less than the one from the specified motif.

Specified by:
compareTo in interface java.lang.Comparable
Specified by:
compareTo in class Motif
Parameters:
o - The motif this method is compared to.
Returns:
Minus one, zero or one as this motif is less than, equal or greater than the specified motif.