de.unibi.techfak.jpredictor.sequences
Class FastaReader

java.lang.Object
  extended by java.io.Reader
      extended by de.unibi.techfak.jpredictor.sequences.SequenceReader
          extended by de.unibi.techfak.jpredictor.sequences.FastaReader
All Implemented Interfaces:
java.io.Closeable, java.lang.Readable

public class FastaReader
extends SequenceReader

Class for adding filter functionality to the SequenceReader. The underlying reader is treated to be in FASTA-style and thus all read characters are subject to the following filter steps: * end-of-line (NewLine w/o FormFeed) is ignored, * the complete line is ignored, if a '>' is found (notice, that the line begins with the found '>', even if it stands in mid-line), * all characters will be uppercased, * the filter string is applied. If no filter-string is present, the fourth step is left out.
While reading the file, the information on a sequence is stored in their appropiate SequenceInformationBlock. By constructing this class one such block is predefined, with all values set to default. Thus it is garantied that the method sequences() functions properly. If a new sequence starts at the beginning of the Reader and no valid characters appeared befor, no new block is created but the first block is filled with the proper information. Note, that this filling cannot be done until a first read is performed. In contrast to the behavior of the first SequenceInformationBlock, a new found but not first sequence always leads to creation of a new SequenceInformationBlock.
Every SequenceInformationBlock is filled from informations found in the delimiter's line. The name of sequence begins after the '>' and can be up to MAX_NAME_LENGTH characters long.
The filter string functions the following: the characters of the string represent the uppercase letters. Thus filter.charAt(0) holds the character that replaces all uppercase A's, filter.charAt(1) stands for B's to be replaced and so on. A space in the filter string stands for discarding that character, e.g. the string "A B" lets through only A's and C's, and C's are transformed to B's.

See Also:
SequenceInformationBlock, SequenceFilter.filterString(String, String), MAX_NAME_LENGTH

Field Summary
private  java.lang.String filter
          The file filter used in the readFiltered()-method.
static int MAX_NAME_LENGTH
           The maximal length of the sequence identifier in a FASTA file.
 
Fields inherited from class de.unibi.techfak.jpredictor.sequences.SequenceReader
countCharsRead, mark, sequenceBlocks
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
FastaReader(java.io.Reader reader, java.lang.String filter)
          Sets filter and reader.
 
Method Summary
static java.lang.String createSequenceFromFastaFile(java.lang.String filename, java.lang.String filter)
          Reads a file completely and stores the sequence in it to a String.
 java.lang.String getFilter()
          Returns the filter used to filter the read sequence befor returning it.
protected  int readFiltered()
          Reads characters from the stream until the next valid character occurs which is returned.
 void setFilter(java.lang.String filter)
          Sets a new filter.
 
Methods inherited from class de.unibi.techfak.jpredictor.sequences.SequenceReader
close, mark, markSupported, read, read, readLine, readUnfiltered, reset, sequences, skip
 
Methods inherited from class java.io.Reader
read, read, ready
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_NAME_LENGTH

public static final int MAX_NAME_LENGTH

The maximal length of the sequence identifier in a FASTA file.

In a line of a FASTA file all characters after the leading '>' are meant to be the identifier. Such a line may be arbitrary long. To minimize memory consumption, the identifier may not be longer than 32000 characters.

See Also:
SequenceInformationBlock, Constant Field Values

filter

private java.lang.String filter
The file filter used in the readFiltered()-method.

See Also:
setFilter(String), SequenceReader.readFiltered()
Constructor Detail

FastaReader

public FastaReader(java.io.Reader reader,
                   java.lang.String filter)
            throws java.lang.NullPointerException
Sets filter and reader.

Parameters:
reader - The Reader to read from.
filter - Filter for every character read.
Throws:
java.lang.NullPointerException - If reader is null.
Method Detail

getFilter

public java.lang.String getFilter()
Returns the filter used to filter the read sequence befor returning it.

Returns:
The filter as String.

setFilter

public void setFilter(java.lang.String filter)
Sets a new filter. Valid filters are e.g. SequenceFilter.DNA_FILTER.

Parameters:
filter - The new filter to set.
See Also:
SequenceFilter

readFiltered

protected int readFiltered()
                    throws NewSequenceBlockException
Description copied from class: SequenceReader
Reads characters from the stream until the next valid character occurs which is returned. This method also writes the sequence information block to cut a file into single sequences seperated by a predefined delimiter.

Specified by:
readFiltered in class SequenceReader
Returns:
The next valid character, -1 in case of EOF
Throws:
NewSequenceBlockException - An IOException thrown when a new sequence occurs within the file.

createSequenceFromFastaFile

public static java.lang.String createSequenceFromFastaFile(java.lang.String filename,
                                                           java.lang.String filter)
Reads a file completely and stores the sequence in it to a String. The sequence information should be in Fasta Format, that is, lines starting with '>' act as block delimiter and are ignored. Set filter = null, if you want every character from the file (except the lines starting with '>').

Parameters:
filename - The name of the file to read.
filter - Used to filter the read character.
Returns:
The sequence as String, null in case of any error.
See Also:
SequenceFilter.filterString(String, String)