de.unibi.techfak.jpredictor.sequences
Class SequenceFilter

java.lang.Object
  extended by de.unibi.techfak.jpredictor.sequences.SequenceFilter

public class SequenceFilter
extends java.lang.Object

This class holds filters to filter sequences with. All chars that are not allowed are replaced by dashes (minus), thus the indices are saved. This behaviour differs from MotifFilter.
Example: Imagine two motifs and inbetween 300 Xs. All X are replaced by dashes, so that the distance information stay valid.

See Also:
MotifFilter

Field Summary
static java.lang.String DNA_COMPLEMENT
          The complement characters of the degenarated code of DNA.
static java.lang.String DNA_FILTER
          Lets through only ACGT.
static java.lang.String DNA_FILTER_DEGENERATED
          Lets through the degenerated DNA one letter code, that is ACGT, B (not A), D (not C), H (not G), V (not T), KMRSWY (each combinations of two bases).
static java.lang.String DNA_FILTER_RESTRICTED
          Lets through only ACGT.
static java.lang.String DNA_RNA_FILTER
          Lets through only ACGTU.
static java.lang.String DNA_RNA_FILTER_DEGENERATED
          Lets through the degenerated DNA or RNA one letter code, that is ACGTU, B (not A), D (not C), H (not G), V (not T), KMRSWY (each combinations of two bases).
static java.lang.String RNA_COMPLEMENT
          The complement characters of the degenarated code of RNA.
static java.lang.String RNA_FILTER
          Lets through only ACGU.
static java.lang.String RNA_FILTER_DEGENERATED
          Lets through the degenerated RNA one letter code, that is ACGU, B (not A), D (not C), H (not G), V (not U), KMRSWY (each combinations of two bases).
static java.lang.String RNA_FILTER_RESTRICTED
          Lets through only ACGU.
 
Constructor Summary
SequenceFilter()
           
 
Method Summary
static java.lang.String filterString(java.lang.String sequ, java.lang.String filter)
           The given filter is used to remove all disallowed characters from the given sequ.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DNA_FILTER

public static final java.lang.String DNA_FILTER
Lets through only ACGT. Changes U to T and all other chars to dash (-).

See Also:
Constant Field Values

DNA_FILTER_RESTRICTED

public static final java.lang.String DNA_FILTER_RESTRICTED
Lets through only ACGT. Changes all other chars to dash (-).

See Also:
Constant Field Values

DNA_FILTER_DEGENERATED

public static final java.lang.String DNA_FILTER_DEGENERATED
Lets through the degenerated DNA one letter code, that is ACGT, B (not A), D (not C), H (not G), V (not T), KMRSWY (each combinations of two bases). Changes any U to a T. Replaces all other chars with dash (-).

See Also:
Constant Field Values

DNA_RNA_FILTER

public static final java.lang.String DNA_RNA_FILTER
Lets through only ACGTU. Replaces all other chars with dash (-).

See Also:
Constant Field Values

DNA_RNA_FILTER_DEGENERATED

public static final java.lang.String DNA_RNA_FILTER_DEGENERATED
Lets through the degenerated DNA or RNA one letter code, that is ACGTU, B (not A), D (not C), H (not G), V (not T), KMRSWY (each combinations of two bases). Replaces all other chars with dash (-).

See Also:
Constant Field Values

RNA_FILTER

public static final java.lang.String RNA_FILTER
Lets through only ACGU. Changes any T to a U. Replaces all other chars with dash (-).

See Also:
Constant Field Values

RNA_FILTER_RESTRICTED

public static final java.lang.String RNA_FILTER_RESTRICTED
Lets through only ACGU. Replaces all other chars with dash (-).

See Also:
Constant Field Values

RNA_FILTER_DEGENERATED

public static final java.lang.String RNA_FILTER_DEGENERATED
Lets through the degenerated RNA one letter code, that is ACGU, B (not A), D (not C), H (not G), V (not U), KMRSWY (each combinations of two bases). Changes any T to a U. Replaces all other chars with dash (-).

See Also:
Constant Field Values

DNA_COMPLEMENT

public static final java.lang.String DNA_COMPLEMENT
The complement characters of the degenarated code of DNA. Use this by accessing single chars with charAt(i).

See Also:
Constant Field Values

RNA_COMPLEMENT

public static final java.lang.String RNA_COMPLEMENT
The complement characters of the degenarated code of RNA. Use this by accessing single chars with charAt(i).

See Also:
Constant Field Values
Constructor Detail

SequenceFilter

public SequenceFilter()
Method Detail

filterString

public static java.lang.String filterString(java.lang.String sequ,
                                            java.lang.String filter)

The given filter is used to remove all disallowed characters from the given sequ. The method functions the following: First, the sequence is uppercase'd. The filter string must begin with a representing character for 'A', followed by the one for 'B' and so on. If the filter string is the empty string, then the empty string is returned. Spaces in the filter string stand for invalid characters.

E.g. imagine the filter string " BD". From all sequences A's are discarded, B's are left unchanged and C's are replaced by D's (all other characters are also discarded, because the filter only defines replacements for the first 3 letters): sequ "ABCF" leads to "BD", "CACA" --> "DD", "bbDD" --> "BB" and so on.

There are some predefined filter strings, for instance DNA_FILTER or DNA_FILTER_DEGENERATED. E.g. using RNA_FILTER_RESTRICTED, the String "aAB[d(TU" is filtered to "AAU", whereas by using RNA_FILTER the result is "AAUU" (the uppercase 'T' is translated to 'U'), and by using RNA_FILTER_DEGENERATED, the result is "AABDUU".

Parameters:
sequ - The sequence to get filtered.
filter - The string to filter the sequence.
Returns:
The newly generated sequence. If sequ is null, null is returned; if filter is null, the unchanged sequence is returned.