de.unibi.techfak.jpredictor.sequences
Class MappedCharSequence

java.lang.Object
  extended by de.unibi.techfak.jpredictor.sequences.MappedCharSequence
All Implemented Interfaces:
java.lang.CharSequence
Direct Known Subclasses:
Sequence

public class MappedCharSequence
extends java.lang.Object
implements java.lang.CharSequence

Class for mapping a file or a stream to a characters sequence. The file or stream is read through a Reader-object. Note, that the Reader has to be resetable in any case.
There are some things to keep in mind when working with this class: First, The return-value of length() may not be valid. Calls to length() will return the maximally possible sequence length, which is Integer.MAX_VALUE. As recent as EOF is reached on the underlying Reader or if the file could not be read any more, the length is determined. Thus it is strongly recommended to read the file until a IndexOutOfBoundsException is thrown (e.g. by calling charAt(Integer.MAX_VALUE) and then obtain the correct length of the underlying reader.
It is also recommended to create a MappedCharSequence-Object through the constructors demanding a filename. Thus, it is garantied that reset() functions correctly, at the worst by reopen the file. If you construct this class by giving a Reader-object, make sure reset() and mark(int) are supported, e.g. by using the ResetableReader-class.
Parts of the Reader are buffered. The internal buffer is of size 64k per default. The size should be increased, if there will be big read jumps on the file, that have to be covered by the buffer. Note, that the buffer has a fixed size. Also, it's size may not follow the setting given while class construction. This is, because a buffer is needed in any case, so the buffer is forced to get created by decreasing it's size, until no OutOfMemoryException is thrown anymore. The minimal buffer size is 1024 byte.

Here follow some words about how the internal buffer is functioning. That buffer is designed to be cyclic. After performing numerous subsequent reads the last read char is at the end of the buffer. By jumping forward with reading (by calling charAt()), the Reader will be read until the specified char is in the focus of the buffer (again at the end). If jumping backward with reading, but the read stays within the focus of the buffer, the char is returned and the buffer is unchanged. If the backward jump leaves the buffer's focus, then the Reader is resetted and read from the beginning to fill the buffer once more. That the Reader has to be read from the beginning is due to the fact, that Reader can only be read sequentially, and the readAheadLimit on supported mark-operations is often lesser than this buffer's size. But for efficiency you can give a BufferedReader to this class.
An example on the buffer's functionality might make it clear. The file may be of size 1000 and the internal buffer be of size 100. After say 300 calls to read() the actual character is the one at the end of the buffer. The buffer's focus covers the characters from position 200 to 299. The first 200 read characters had to be discarded due to the buffer's limited size. Now you jump forward with reading by calling charAt(349), the next 50 characters are read into the buffer and than the 350th char (is returned and) is in the buffer's focus, which now covers the chars from position 250 to 349. If you jump backward with reading by calling charAt(300), you don't leave the focus of the buffer, so the char is returned and the buffer remains unchanged. If you jump backward on your file and leave the buffer's covering range (focus) by say calling charAt(200), the file is read from the beginning until the requested char is in the buffer's focus.

See Also:
ResetableReader

Field Summary
private  char[] buffer
          The internal buffer
private  int cyclicEnd
          The end position of the buffer.
private  int cyclicStart
          The start position of the buffer.
private  java.lang.String filename
          The filename used to reopen the file.
private  int filled
          The number of bytes in the buffer
protected  long length
          The sequence's length.
private  long overAllBase
          The base position (number of characters in the previous buffers).
private  java.io.Reader reader
          The wrapper class of the open file
 
Constructor Summary
MappedCharSequence(java.io.Reader reader)
          Sets the Reader.
MappedCharSequence(java.io.Reader reader, int bufferSize)
          Sets the Reader and the size of the internal buffer.
MappedCharSequence(java.lang.String filename)
          Sets the filename and opens the file.
MappedCharSequence(java.lang.String filename, int bufferSize)
          Sets the filename and opens the file.
 
Method Summary
 char charAt(int position)
          Returns a character at a given position within the file's valid characters.
private  void clearBuffer()
          Inits all internal variables regarding the buffer.
 int getBufferSize()
           
 java.io.Reader getReader()
          Returns the underlying Reader-Object.
 int length()
          The length of the sequence or Integer.MAX_VALUE is returned.
private  void reset()
          If reset on the underlying Reader fails the file is tried to be reopened, if the file was constructed via a filename.
private  int setBufferSize(int bufferSize)
          Initializes the internal buffer.
 java.lang.CharSequence subSequence(int startInReader, int endInReader)
          No functionality.
 java.lang.String toString()
          Returns a string over the internal buffer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

filename

private java.lang.String filename
The filename used to reopen the file.


reader

private java.io.Reader reader
The wrapper class of the open file


buffer

private char[] buffer
The internal buffer


cyclicEnd

private int cyclicEnd
The end position of the buffer. If the buffer is completely filled, cyclicEnd is equal to cyclicStart and filled is buffer.length


cyclicStart

private int cyclicStart
The start position of the buffer. If the buffer is completely filled, cyclicStart is equal to cyclicEnd and filled is buffer.length


filled

private int filled
The number of bytes in the buffer


overAllBase

private long overAllBase
The base position (number of characters in the previous buffers).


length

protected long length
The sequence's length. Determined after the first IndexOutOfBoundsException gotten after a call to charAt

Constructor Detail

MappedCharSequence

public MappedCharSequence(java.io.Reader reader)
                   throws java.lang.NullPointerException
Sets the Reader. The buffer's size is set to the default (64k).

Parameters:
reader - The Reader to get the data from.
Throws:
java.lang.NullPointerException - If reader is null

MappedCharSequence

public MappedCharSequence(java.lang.String filename)
                   throws java.io.FileNotFoundException,
                          java.lang.NullPointerException
Sets the filename and opens the file. The buffer's size is set to the default (64k).

Parameters:
filename - The name of the file to read from.
Throws:
java.io.FileNotFoundException - If the file could not be found.
java.lang.NullPointerException - If filename is null

MappedCharSequence

public MappedCharSequence(java.lang.String filename,
                          int bufferSize)
                   throws java.io.FileNotFoundException,
                          java.lang.NullPointerException
Sets the filename and opens the file. The buffer's size is set to bufferSize.

Parameters:
filename - The name of the file to read from.
bufferSize - The initial size of the buffer.
Throws:
java.io.FileNotFoundException - If the file could not be found.
java.lang.NullPointerException - If filename is null

MappedCharSequence

public MappedCharSequence(java.io.Reader reader,
                          int bufferSize)
                   throws java.lang.NullPointerException
Sets the Reader and the size of the internal buffer.

Parameters:
reader - The reader to get the data from.
bufferSize - The initial size of the buffer.
Throws:
java.lang.NullPointerException - If reader is null
See Also:
setBufferSize(int)
Method Detail

setBufferSize

private int setBufferSize(int bufferSize)
Initializes the internal buffer.
If the buffer is not already created the method tries to allocate memory for the buffer. In case of an OutOfMemoryError the bufferSize is devided by 2 and allocation is tried again. The minimal buffer size permitted is 1024 byte.
If the buffer was already created it is untouched and the size is returned.

Parameters:
bufferSize - The size of the buffer to create. Ignored, if the buffer was already created.
Returns:
The buffer size, -1 in case of not enough memory

getBufferSize

public int getBufferSize()
Returns:
The size of the internal buffer or -1, if the buffer does not exist.

clearBuffer

private void clearBuffer()
Inits all internal variables regarding the buffer.


getReader

public java.io.Reader getReader()
Returns the underlying Reader-Object.

Returns:
The underlying reader, this class was constructed with.

charAt

public char charAt(int position)
            throws java.lang.IndexOutOfBoundsException
Returns a character at a given position within the file's valid characters. The underlying Reader-Object is only read through calls to read().

Specified by:
charAt in interface java.lang.CharSequence
Parameters:
position - The character's position in the file.
Returns:
The char at that position.
Throws:
java.lang.IndexOutOfBoundsException - Can be thrown for various reasons, 1. if position is greater or equal than the length of the underlying Reader, 2. if a reset was neccessary but was not supported or 3. if no more chars can be read from the Reader . Note, that the length will only be determined in the last case.

length

public int length()
The length of the sequence or Integer.MAX_VALUE is returned. The correct length of the underlying reader is determined only in case the reader is fully read. Note, that a construct like
 for (i=0; i
 is about to fail unless the intention was to print the first
 character in the stream about Integer.MAX_VALUE times.

Specified by:
length in interface java.lang.CharSequence
Returns:
The correct length only in case the underlying Reader was fully read, otherwise Integer.MAX_VALUE .

subSequence

public java.lang.CharSequence subSequence(int startInReader,
                                          int endInReader)
No functionality.

Specified by:
subSequence in interface java.lang.CharSequence
Parameters:
startInReader -
endInReader -
Returns:
Returns this.

reset

private void reset()
            throws ResetNotSupportedException
If reset on the underlying Reader fails the file is tried to be reopened, if the file was constructed via a filename.

Throws:
ResetNotSupportedException - If this class was not constructed via a filename and the underlying reader does not support reset()

toString

public java.lang.String toString()
Returns a string over the internal buffer. There are two possible ways to obtain the returned string:
- Only the number of characters in the filled buffer are returned. The characters already discarded due to limited buffer size are not part of that string.
- If the buffer contains free space, the underlying Reader is read until the buffer is filled or no more characters are available. Then the buffers content is returned.

Specified by:
toString in interface java.lang.CharSequence
Overrides:
toString in class java.lang.Object
Returns:
A string containing the chars in the buffer.