Integer types

In general, use standard C integer types whenever it seems appropriate. In
particular, make use of size_t and do not hesitate using it for array indices.

The return and/or argument type of many standard functions, e.g., strlen() or
malloc(), is size_t. Assigning a size_t to anything else but size_t asks for
serious problems when values become large. We are doing suffix arrays here, so
values WILL become large.

Suffix arrays can be stored either using 32 bit or 64 bit unsigned integers.
Both sizes are supported on all platform, regardless of their native word
sizes. Data values in any enhanced suffix array table must be of either of the
following types:

  (1) unsigned char,
  (2) fid_Symbol,
  (3) fid_Uint32,
  (4) fid_Uint64.

No other types are allowed. When operating on enhanced suffix arrays, lengths
and indices must be of type fid_Uint32 or fid_Uint64, depending on the word
size used for the suffixes. Repeat: no other types allowed.

On a lower level, i.e., when dealing with the details of memory mapped files
and concrete files stored on disk, the size_t type is used throughout. The
off_t type is not used. The reason is that mmap() operates with sizes of type
size_t, and that mmap() is used for reading and writing all files. Since
working with sliding windows is always a major pain (on 32 bit platforms size_t
is usually 32 bits wide, but off_t may be 64 bits, so very large files can only
be mapped in "small" chunks), the maximum file size is rigorously limited
according to the maximum value that a size_t can represent. This model will not
be changed in the foreseeable future. Period.

When a transition from size_t (or other large types) to fid_Uint32 has to be
performed (e.g., file handling or memory chunks), perform range checks and
error out if the assigned value is too large to be represented in a 32 bit
integers. Macros are good for this and play well in template files.


Array indices

As mentioned above, use size_t where appropriate. Using int might be OK if the
array can be guaranteed to be small (in particular, has less that 2^31
entries). Never use int on suffix arrays, sequence data, and so on since these
might go well beyond 2GB.

If a printable character, i.e., a char must be used as array index (alphabets),
then use macro fid_CHAR_AS_INDEX() for casting the char value in a safe way.

For fid_Symbol values it is a good idea to cast into size_t when using them as
index. Splint reports warnings otherwise, and they are reported for good
reasons in many instances. As an alternative, consider using fid_Uint16 or just
plain int, especially when coding loops.


Format strings

For the fixes width integer types defined in libfid.h (i.e., libfidinttypes.h)
use their appropriate format string when printing them. Some types must be
casted to a printable type and printed using the appropriate format string. See
table below.

type          | format string | notes
--------------+---------------+------------------------------------
fid_Uint16    | %hu           |
fid_Sint16    | %hd           |
fid_Uint32    | fid_U32FMT    |
fid_Sint32    | fid_S32FMT    |
fid_Uint64    | fid_U64FMT    |
fid_Sint64    | fid_S64FMT    |
fid_Symbol    | fid_SYMFMT    |
size_t        | %lu           | cast to unsigned long
__LINE__      | %d            |
unsigned char | %hhu          | if numeric value should be printed
signed char   | %hhd          | if numeric value should be printed

Use standard format strings for printing all other C types. Do not cast unless
you need to to avoid problems (see size_t).
