de.unibi.techfak.jpredictor.evolution
Class MotifEvolution

java.lang.Object
  extended by de.unibi.techfak.jpredictor.evolution.MotifEvolution
Direct Known Subclasses:
MotifEvolutionES, MotifEvolutionFull

public abstract class MotifEvolution
extends java.lang.Object

Abstract class defining the framework for a motif evolution. Such framework is simple a convention in which order several methods are called. See the start(int)-method for further insights.

See Also:
start(int)

Field Summary
(package private)  java.io.BufferedInputStream buffIn
          For if the user wants to skip the evolution.
(package private)  ICommunicator comm
          A local instance of the communicator the class was constructed with.
(package private)  char[] generatingCharacters
          The characters the motifs are generated from and evolved into.
(package private)  double[] generatingDistribution
          The nucleotide probability distribution.
(package private)  MotifList[] gML
          The global array of motif lists.
(package private)  long motifCounter
          Motif counter to name the motifs in a unique way.
(package private)  double[] mutationProbabilities
          Stores the probabilities for the motif evolution.
(package private)  int numberOfNnucs
          Determines the number of 'N' nucleotides allowed in one sequence motif.
(package private)  IOperator op
          A local instance of the operator.
(package private)  double recombinationProbability
          Determines whether recombination occurs in the evolutionary process or not.
(package private)  int temperature
          The actual temperature, set in the start(int) method.
 
Constructor Summary
MotifEvolution(ICommunicator comm)
          Creates a MotifEvolution object but does not start the evolution.
 
Method Summary
(package private)  Motif evolveMotif(Motif m, int steps)
           Evolves a motif.
(package private) abstract  void evolveMotifLists()
          The lists of motifs are evolved into the next generation.
(package private)  Motif evolveMultiMotif(MultiMotif mm, int steps)
           Evolves a multi motif.
(package private)  Motif evolveSequenceMotif(RegularExpressionMotif m, int steps)
           Evolves a regular expression motif.
(package private)  void fillNewWeights(MotifList ml, double[] w)
          This method fills the array w with the new weights for the motifs.
(package private) abstract  int initMotifLists()
          Generates the initial populations.
(package private)  void outputResults()
          Outputs the result of the evolution.
(package private)  MotifList recombineParentSet(MotifList mlMale, MotifList mlFemale, MotifList mlChildren, int count)
           Recombines one or two parental sets of arbitrary multi motifs into a new set of multi motifs (offsprings).
(package private)  void restrainMotifList(MotifList ml, int count)
           Throws away motifs from the list until the given number is reached.
(package private) abstract  void selectMotifLists()
          Goes through all motif lists and selects the motifs to be kept.
 void setMutationProbabilities(double mLen, double mDist, double mError, double mNucl)
           Sets the new mutation probabilities for the evolutionary process.
 void setRecombinationProbability(double recombinationProbability)
          Sets the new probability for recombination events occuring during evolution.
 void start(int temp)
          Starts the evolutionary process.
(package private)  boolean weightMotifLists()
          Weights the motifs in all motif lists.
(package private)  boolean weightMotifs(java.util.Vector[] posOcc, java.util.Vector[] negOcc)
          Weights motifs from the global communicator using the global operator.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

comm

ICommunicator comm
A local instance of the communicator the class was constructed with.


op

IOperator op
A local instance of the operator. Set in the constructor.


motifCounter

long motifCounter
Motif counter to name the motifs in a unique way. After every use the counter should be increased.


generatingCharacters

char[] generatingCharacters
The characters the motifs are generated from and evolved into.


generatingDistribution

double[] generatingDistribution
The nucleotide probability distribution. Set in the constructor to the background of the communicator.


gML

MotifList[] gML
The global array of motif lists. Normally, only one motif list is neccessary, but for certain evolutionary applications many populations might be useful.


temperature

int temperature
The actual temperature, set in the start(int) method.


mutationProbabilities

double[] mutationProbabilities
Stores the probabilities for the motif evolution. Four probabilities for altering different things in multi motifs are defined here, 1. length of single motifs (p_1=0.05), 2. distance between single motifs (p_2=0.05), 3. error number allowed for match to switch (p_3=2%), and 4. to change one nucleotide (p_4=1-p_1-p_2-p_3=0.88). Note that the probabilities herein are stored as a wheel, thus the last probability is always one.

See Also:
setMutationProbabilities(double, double, double, double)

recombinationProbability

double recombinationProbability
Determines whether recombination occurs in the evolutionary process or not. Valid values are therefore 1.0 and 0.0.


numberOfNnucs

int numberOfNnucs
Determines the number of 'N' nucleotides allowed in one sequence motif. When evolving MultiMotifs this number should be zero, otherwise one or two. This value should be set in the init method.


buffIn

java.io.BufferedInputStream buffIn
For if the user wants to skip the evolution.

Constructor Detail

MotifEvolution

public MotifEvolution(ICommunicator comm)
               throws java.lang.NullPointerException,
                      java.lang.IllegalArgumentException
Creates a MotifEvolution object but does not start the evolution. All settings the motif evolution has to use must be made into the given communicator, e.g. initial motif set, training sets, window width, ...

Parameters:
comm - The communicater in which at least the training sets must be defined.
Throws:
java.lang.NullPointerException
java.lang.IllegalArgumentException
Method Detail

setMutationProbabilities

public void setMutationProbabilities(double mLen,
                                     double mDist,
                                     double mError,
                                     double mNucl)
                              throws java.lang.IllegalArgumentException

Sets the new mutation probabilities for the evolutionary process. Evolving a multi motif consisting of regular expression motifs means actually altering four mutatable things: 1. length of single motifs, 2. distance between single motifs, 3. error number allowed for match to switch, and 4. to change one nucleotide.

There are two constraints, which are checked in this methods and which are seen to be fulfilled. This is done by adapting and normalizing the four probabilities. The constraints are: 1. no probability must be negativ (otherwise it is set to zero), 2. the sum over all four probabilities must be one. If this method is called with all four probabilities being zero, an exception is thrown.

Parameters:
mLen - The probability to alter the length of single motifs.
mDist - The probability to alter the intermotif distance.
mError - The probability to switch the error number allowed for a match for one single motif.
mNucl - The probability to alter the nucleotides of the single motifs sequences.
Throws:
java.lang.IllegalArgumentException - If all four probabilities are zero.

setRecombinationProbability

public void setRecombinationProbability(double recombinationProbability)
                                 throws java.lang.IllegalArgumentException
Sets the new probability for recombination events occuring during evolution. Must be either zero or one.

Parameters:
recombinationProbability - The recombinationProbability to set.
Throws:
java.lang.IllegalArgumentException - If the parameter is neither zero nor one.

start

public void start(int temp)
Starts the evolutionary process. The methods called within are
  1. Initialization (of the parental generation)
  2. Loop
    1. Evolutionary step (generate children)
    2. Evaluation step (weight motifs)
    3. Selection step (choose best motifs)
  3. Output

Parameters:
temp - The starting temperature for the simulated annealing process, also the number of generations to perform. If it is negative the method returns without doing something. If it is zero the loop is skipped.
See Also:
initMotifLists(), evolveMotifLists(), weightMotifLists(), selectMotifLists(), outputResults()

initMotifLists

abstract int initMotifLists()
Generates the initial populations. This method is the first in start(int) to be called.

Returns:
zero in case of no error, one, if an error occured while weighting the motifs
See Also:
start(int)

weightMotifLists

boolean weightMotifLists()
Weights the motifs in all motif lists. Within the evolutionary loop this method is the second to be called. This method simple sets one motif list after the other in the communicator and then calls weightMotifs( null, null ). If you want another way of weighting, or if some motif lists might be left out of weighting overwrite this method

See Also:
start(int), weightMotifs(Vector[], Vector[])

selectMotifLists

abstract void selectMotifLists()
Goes through all motif lists and selects the motifs to be kept. Within the evolutionary loop this method is the third (last) to be called. A good start for the selection would be to call restrainMotifList(MotifList, int).

See Also:
start(int), restrainMotifList(MotifList, int)

evolveMotifLists

abstract void evolveMotifLists()
The lists of motifs are evolved into the next generation. Evolution in this sense means any combination of cloning, recombination or mutation. Within the evolutionary loop this method is the first to be called.

See Also:
start(int)

outputResults

void outputResults()
Outputs the result of the evolution. This method goes through all motif list in this class and prints them via a call to OptionFile#writeMotifList( comm.out(), gML[i]). If you want another way of handling the results overwrite this method.

See Also:
OptionFile.writeMotifList(PrintStream, MotifList), start(int)

weightMotifs

boolean weightMotifs(java.util.Vector[] posOcc,
                     java.util.Vector[] negOcc)
Weights motifs from the global communicator using the global operator. After clearing the vectors they are filled with informations on how often the motifs were found on the positive and on the negative training set. One vector for each motif, containing as many Integer values as there are sequences.

Parameters:
posOcc - An array of vectors filled with the occurrences of a motif on the sequences of the positive training set. One vector for every motif.
negOcc - An array of vectors filled with the occurrences of a motif on the sequences of the negative training set. One vector for every motif.
Returns:
true if the motifs were weighted, false, otherwise.

evolveMotif

Motif evolveMotif(Motif m,
                  int steps)

Evolves a motif. Checks for it to be a MultiMotif, in which case the method evolveMultiMotif will be called. If the given motif is a SequenceMotif, the method evolveSequenceMotif will be called. No support was yet implemented for evolving matrix motifs.

Note, that there might occur neutral evolution. If steps equals one, however, the method guaranties that the evolutionary step performed has changed the motif. If steps is greater one, some mutations might neutralize the previous ones, e.g. one motif is elongated (a random nucleotide is attached), the elongated nucleotide is changed (neutral mutation in the sense that it is not seen in the result), the previous elongated motif is shortend (neutral mutation in the sense that the original motif is not changed).

Parameters:
m - The motif to be evolved.
steps - Number of evolutionary steps performed on the motif.
Returns:
The new Motif or null either if m==null or if steps is less than one.
See Also:
evolveMultiMotif(MultiMotif, int)

evolveSequenceMotif

Motif evolveSequenceMotif(RegularExpressionMotif m,
                          int steps)

Evolves a regular expression motif. Three things are under mutation in such a motif, 1. the length of a single motif (constraints: might not drop below five and does not exceed 10), 2. error number allowed for a match (switched for one motif between zero and one), and 3. nucleotide alteration within single motifs (constraints: occur uniformly distributed, only so much 'N' nucleotides). The chances for the different mutation events can be set calling setMutationProbabilities(double, double, double, double). Note, that the probability to change distances is ignored in this method.

The evolved motifs are named after the motif global counter number, thus no name occurs twice.

Parameters:
m - The regular expression motif to be evolved.
steps - Number of evolutionary steps performed on the motif.
Returns:
The new RegularExpressionMotif or null if steps is less than one.
See Also:
setMutationProbabilities(double, double, double, double), mutationProbabilities

evolveMultiMotif

Motif evolveMultiMotif(MultiMotif mm,
                       int steps)

Evolves a multi motif. Within this method only the distances between comprised motifs are altered. The constraints for this event are: changed with normal distribution, 0≤min,max≤440, min≤max. For each mutation event a random number is drawn deciding whether the distance is about to be mutated or whether other event shall occur. In the latter case, one random motif is drawn from the given MultiMotif and the method evolveMotif(Motif) is called.

Note, that this method is not limited anymore to a certain kind of comprised motifs. Since the evolution of comprised motifs follows a recursion, every kind of comprised motif, even MultiMotifs are allowed.

After finishing the evolution the name of the motif is adapted. In this process the MultiMotif's name is changed in accordance to the names of the comprised motifs, in the form: name1-(minDist,maxDist)-name2-(minDist,maxDist)-name3-....

Parameters:
mm - The MultiMotif to get evolved.
steps - Number of evolutionary steps performed on the motif.
Returns:
The new MultiMotif or null if steps is less than one.
See Also:
setMutationProbabilities(double, double, double, double), mutationProbabilities, evolveMotif(Motif, int), evolveSequenceMotif(RegularExpressionMotif, int)

restrainMotifList

void restrainMotifList(MotifList ml,
                       int count)

Throws away motifs from the list until the given number is reached. When the given motif list is null or if it already contains less motifs than count denotes, nothing is done to the list. Otherwise motifs from the list are chosen randomly and are discarded.

The choosing process relies completely on the weights. Before the best weighting motifs are chosen to be kept, the method fillNewWeights(MotifList, double[]) is called to change the weights. Within this procedure every motif can be assigned an artificial weight, which cannot be seen by the motif itself. These artificial weights are used for picking the best weighting motifs.

Parameters:
ml - The motif list to be restrained.
count - The number of motifs allowed to stay in the list.
See Also:
fillNewWeights(MotifList, double[])

fillNewWeights

void fillNewWeights(MotifList ml,
                    double[] w)
This method fills the array w with the new weights for the motifs. The given motif list remains unchanged.

Parameters:
ml - The list of weighted motifs.
w - An array at least the size of the motif list to be filled with the new weights for the motifs.

recombineParentSet

MotifList recombineParentSet(MotifList mlMale,
                             MotifList mlFemale,
                             MotifList mlChildren,
                             int count)

Recombines one or two parental sets of arbitrary multi motifs into a new set of multi motifs (offsprings). The process is the following. Randomly, two multi motifs are chosen, one from male, one from female list. If one list is null, the multi motif is chosen from the other parental set. In order to recombine both motifs a crossover point is drawn. Up to this point, the motifs are taken from either male or female multi motif, from this point on, the single motifs are drawn from the other multi motif. The distances are mean values over both distances.

No change is done to the motif parts itself. One parental motif list may be null, but not both. If given, the parental motif list must contain at least one multi motif. Otherwise nothing is done and null is returned. Double motifs are added to the childrens list through the addCheck(int, Motif)-method. If 20 fruitless attempts were made to insert a new multi motif to the childrens list, the creation of new multi motifs breaks.

With this method it is possible to recombine MultiMotifs of different lengths. In this case, first, the length of the new multi motif is randomly drawn, second, the crossover point is drawn. It is possible that the crossover point lies behind the end of the smaller multi motif. If this happens, the new multi motif completely contains the smaller motif.

Parameters:
mlMale - First list of multi motifs to be recombined.
mlFemale - Second list of parental multi motifs to be recombined.
mlChildren - Motif list to be filled with the motifs generated by joining motif parts from male and female. Might be null.
count - Number of motifs maximal in the children motif list.
Returns:
The list of child motifs, which is the same as mlChildren as long as it was not given as null .
See Also:
motifCounter, MotifList.addCheck(int, Motif)