/*! \mainpage libfid documentation

\section preamble Preamble

This is the API documentation for the <b>F</b>ull-text <b>I</b>ndex
<b>D</b>ata structure library (\em libfid).

The library \em libfid is published under the terms of the GNU Lesser
General Public License Version 2.1 (or any later version), GNU LGPL
for short. See file \c COPYING, shipped as part of the library sources,
for the exact terms of licensing. Also included are a few programs,
all published under the terms of the GNU General
Public License Version 2 (or any later version), GNU GPL for
short. See file \c COPYING, shipped as part of the library sources in
directory \c tools, for the exact terms of licensing of these programs.
See http://www.gnu.org/copyleft/ to learn more about these licenses.


\section intro Introduction

Searching for patterns in textual data is a task frequently assigned to
computers, and many algorithms have been developed for solving this task for
all kinds of patterns. There are many classic pattern matching algorithms for
searching efficiently for exact occurrences of string patterns like simple
strings or regular expressions in a text, or for approximate occurrences
thereof under some model of similarity. String patterns, as well as other
pattern kinds, are also of interest in the field of computational biology.
This field, however, is facing the problem of an exponential increase of
amount of sequence data that needs to be searched. Thus, the time spent for
searches is also increasing dramatically since the running times of the
search algorithms employed usually depend at least linearly on the size of
the search space. Sometimes big cluster systems are the only practical way to
tackle this problem.

With the decreasing costs of computer memory, be it RAM or harddisk,
and the broad availability of 64 bit CPUs, a feasible alternative, or
accompanying, solution for searching in large data sets is the use of
full-text index data structures such as the enhanced suffix array.
Search algorithms operating on enhanced suffix arrays can often achieve
sublinear running times with respect to the database size, at the cost of
preprocessing the sequence data and storing an index for it on harddisk.
For an introduction to suffix arrays in general take a look at [1],
enhanced suffix arrays are described in detail in [2].

This software library provides data structures for representing enhanced
suffix arrays, and implements many operations frequently performed on these.
Sequence data is generally transformed into binary representation using freely
definable alphabets. The library expects enhanced suffix arrays being stored
in a format as generated by \c mkvtree from the \em Vmatch package written
by Stefan Kurtz. The \c slowbuildesa program that comes with \em libfid can
also be used to construct enhanced suffix arrays, but be warned that
\c slowbuildesa implements a naive suffix sorter based on quicksort
and simple string comparisons, and thus is much slower than \c mkvtree.
(The prime use of \c slowbuildesa is for running tests via
<code>make check</code>, which is also the reason why it is doesn't get
installed by <code>make install</code>.)

Please consider using our advanced tool \c mkesa [3] for enhanced suffix array
construction, a vastly improved version of \c slowbuildesa based on a
multithreaded Deep-Shallow [4] implementation. It is typically faster
than \c mkvtree and also more space conserving. \c mkesa is available in
source code under the terms of the GNU General Public License Version 2
(or any later version), and distributed as a separate package on
<a href="http://bibiserv.techfak.uni-bielefeld.de/mkesa/">
http://bibiserv.techfak.uni-bielefeld.de/mkesa/</a>.


\section install Installation

Simple installation:

<code>./configure [options] && make && make install</code>

Optionally, run <code>make check</code> before installation to run the test
suite. All tests should pass with no error, except for the first one which
will be skipped unless you have downloaded the extra test suite data
(<code>libfid-testdata.tar.gz</code>), and extracted it below the source path
of \em libfid.

By default, <code>make install</code> will install the library below
<code>/usr/local/</code>. This can be changed by passing the appropriate
options to <code>configure</code>. Use <code>./configure --help</code> to
learn about the configuration options.

<em>Hint:</em> create a <code>config.site</code> file and let the environment
variable \c CONFIG_SITE point to it. You can put your default compiler flags
(optimization, paths, etc.) in there and avoid retyping them over again each
time you need to invoke \c configure. All \c configure scripts generated by
GNU Autoconf 2.x honor the \c CONFIG_SITE variable, so its use is not limited
to \em libfid.


\section usage How to use libfid in your programs

Incorporating the functionalities implemented in \em libfid in other
programs is easy as described next.

- Include \c libfid.h in your program when programming in plain C, or
\c libfidxx.h for C++ (support for C++ is, however, somewhat experimental).
- When building a debug version of your program, define preprocessor symbol
\c DEBUG for compilation and link against the debug version of the library.
(A debug version of the library can be built by configuring with
<code>--enable-debug</code>.)
- When building a non-debug version of your program, make sure that \c DEBUG
is not defined and link against the non-debug version of the library.

Optionally: 
- If you only need the 32 (or 64) bit interfaces in your C program, you may
want to include \c libfid.32 (or \c libfid.64) to get all interfaces with the
\c _32 (or \c _64) suffix removed. Please note that using the 32 bit
interface simply translates to "working with enhanced suffix arrays
represented by 32 bit integers", and has \e nothing to do with compiling a
32 or 64 bit executable.
\see wordsize
- For C++ you may want to use \c fid_traits<32> and \c fid_traits<64> from
\c libfidxx.h to access 32 (or 64) bit interfaces. Do not include 
\c libfid.32 or \c libfid.64 in this case. Note that support for C++ is
experimental, so please don't hesitate contacting the author if you think
there is something wrong or missing.
- You can turn on memory profiling by passing \c --enable-memprof to the
\c configure script. Note that this implies that your programs must also be
compiled to use the profiling facility then.
\see misc
- If you happen to use GNU Autoconf, then you may find the
\c AM_PATH_LIBFID and \c FID_MEMPROFILING macros handy (see file libfid.m4).

For a minimal source code example, see the code of program exactsearch.c
from the test suite (though its command line handling is ugly).

Note that it is important to link against the correct library version since
some data structures are augmented by extra data fields in debug mode.
Hence, linking a program compiled without defining symbol \c DEBUG against
the debug version of \em libfid will produce a program that is likely to
crash, or to behave somewhat "funny" otherwise.


\section runtime Runtime behavior

Environment variable \c FID_OPTIONS (see #FID_OPTIONS_VARNAME and
#fid_options_parse()) can be defined to take some influence on how the library
behaves in certain situations.


\section references References

<b>[1]</b> U. Manber and E.W. Myers. Suffix Arrays: A New Method for On-Line
String Searches. <em>SIAM Journal on Computing</em>, 22(5):935-948, 1993.

<b>[2]</b> M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing Suffix Trees
with Enhanced Suffix Arrays. <em>Journal of Discrete Algorithms</em>, 2:53-86,
2004.

<b>[3]</b> R. Homann, D. Fleer, R. Giegerich, M. Rehmsmeier. mkESA: enhanced
suffix array construction tool. <em>Bioinformatics</em>, 25(8):1084-1085, 2009.

<b>[4]</b> G. Manzini, P. Ferragina. Engineering a Lightweight Suffix Array
Construction Algorithm. <em>Algorithmica</em>, 40(1):33-50, 2004.
*/
