de.unibi.techfak.jpredictor.sequences
Class FastaReader
java.lang.Object
java.io.Reader
de.unibi.techfak.jpredictor.sequences.SequenceReader
de.unibi.techfak.jpredictor.sequences.FastaReader
- All Implemented Interfaces:
- java.io.Closeable, java.lang.Readable
public class FastaReader
- extends SequenceReader
Class for adding filter functionality to the SequenceReader.
The underlying reader is treated to be in FASTA-style and thus all read
characters are subject to the following filter steps:
* end-of-line (NewLine w/o FormFeed) is ignored,
* the complete line is ignored, if a '>' is found (notice, that the line
begins with the found '>', even if it stands in mid-line),
* all characters will be uppercased,
* the filter string is applied.
If no filter-string is present, the fourth step is left out.
While reading the file, the information on a sequence is stored in their
appropiate SequenceInformationBlock. By constructing this
class one such block is predefined, with all values set to default.
Thus it is garantied that the method sequences() functions
properly. If a new sequence starts at the beginning of the Reader and
no valid characters appeared befor, no new block is created but the first
block is filled with the proper information. Note, that this filling
cannot be done until a first read is performed. In contrast to the
behavior of the first SequenceInformationBlock, a new found
but not first sequence always leads to creation of a new
SequenceInformationBlock.
Every SequenceInformationBlock is filled from informations
found in the delimiter's line. The name of sequence begins after the
'>' and can be up to MAX_NAME_LENGTH characters long.
The filter string functions the following: the characters of the
string represent the uppercase letters. Thus filter.charAt(0)
holds the character that replaces all uppercase A's,
filter.charAt(1) stands for B's to be replaced and so on.
A space in the filter string stands for discarding that character,
e.g. the string "A B" lets through only A's and C's, and C's are
transformed to B's.
- See Also:
SequenceInformationBlock,
SequenceFilter.filterString(String, String),
MAX_NAME_LENGTH
|
Field Summary |
private java.lang.String |
filter
The file filter used in the readFiltered()-method. |
static int |
MAX_NAME_LENGTH
The maximal length of the sequence identifier in a FASTA file. |
| Fields inherited from class java.io.Reader |
lock |
|
Constructor Summary |
FastaReader(java.io.Reader reader,
java.lang.String filter)
Sets filter and reader. |
|
Method Summary |
static java.lang.String |
createSequenceFromFastaFile(java.lang.String filename,
java.lang.String filter)
Reads a file completely and stores the sequence in it to a String. |
java.lang.String |
getFilter()
Returns the filter used to filter the read sequence befor
returning it. |
protected int |
readFiltered()
Reads characters from the stream until the next valid character occurs
which is returned. |
void |
setFilter(java.lang.String filter)
Sets a new filter. |
| Methods inherited from class java.io.Reader |
read, read, ready |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MAX_NAME_LENGTH
public static final int MAX_NAME_LENGTH
The maximal length of the sequence identifier in a FASTA file.
In a line of a FASTA file all characters after the leading '>' are
meant to be the identifier. Such a line may be arbitrary long.
To minimize memory consumption, the identifier may not be longer
than 32000 characters.
- See Also:
SequenceInformationBlock,
Constant Field Values
filter
private java.lang.String filter
- The file filter used in the
readFiltered()-method.
- See Also:
setFilter(String),
SequenceReader.readFiltered()
FastaReader
public FastaReader(java.io.Reader reader,
java.lang.String filter)
throws java.lang.NullPointerException
- Sets filter and reader.
- Parameters:
reader - The Reader to read from.filter - Filter for every character read.
- Throws:
java.lang.NullPointerException - If reader is null.
getFilter
public java.lang.String getFilter()
- Returns the filter used to filter the read sequence befor
returning it.
- Returns:
- The filter as String.
setFilter
public void setFilter(java.lang.String filter)
- Sets a new filter. Valid filters are e.g.
SequenceFilter.DNA_FILTER.
- Parameters:
filter - The new filter to set.- See Also:
SequenceFilter
readFiltered
protected int readFiltered()
throws NewSequenceBlockException
- Description copied from class:
SequenceReader
- Reads characters from the stream until the next valid character occurs
which is returned. This method also writes the sequence information
block to cut a file into single sequences seperated by a predefined
delimiter.
- Specified by:
readFiltered in class SequenceReader
- Returns:
- The next valid character, -1 in case of EOF
- Throws:
NewSequenceBlockException - An IOException
thrown when a new sequence occurs within the file.
createSequenceFromFastaFile
public static java.lang.String createSequenceFromFastaFile(java.lang.String filename,
java.lang.String filter)
- Reads a file completely and stores the sequence in it to a String.
The sequence information should be in Fasta Format, that is,
lines starting with '>' act as block delimiter and are ignored.
Set
filter = null, if you want every character from
the file (except the lines starting with '>').
- Parameters:
filename - The name of the file to read.filter - Used to filter the read character.
- Returns:
- The sequence as String, null in case of any error.
- See Also:
SequenceFilter.filterString(String, String)