# Usage

Type "mmfind" or "mmfind -h" to get information about the program and 
the options which modify the behaviour of the program:

  mmfind [options] <multiple_fasta_file>

  Evaluates an alignment in multiple FASTA format for mismatches.
  Quality scores are considered if a file with scores is supplied
  (the file should have the same name but a "qual"-extension;
  example: test.fa and test.qual). Scores should be in FASTA format
  and will be mapped on the aligned sequences. mmfind is a command-
  line tool written in Python. It is tested with Python 2.6, 2.7 
  and 3.1.

     FILTERING:

     -a <integer>
            (aligned:) minimal number of aligned sequences (default:2).
            The alignment will not be processed in case it contains 
            less sequences (error code: -602).

     -L <integer>
            (length:) minimal alignment length (default:200, error code: -605).

     -p <integer>
            (polymorphism cutoff:) ignore alignments with percent mismatches per 
            length exceeding given value (default:3, error code: -656).

     -l <integer>
            (length:) maximal length of mismatches to be reported (default:3).

     -b <integer>
            (border distance:) minimal distance of a mismatch to the alignment 
            ends to be reported (default:80).

     -s <integer>
            (score:) minimal average score of the bases of a mismatch (default:20).

     -n <integer>
            (neighborhood scores:) minimal average score of the 10 neighboring bases
            of a mismatch (5 upstream, 5 downstream) (default:15).

     -A
            (all mismatches:) prevent filtering and display all mismatches (default:
            use default filtering, see above).

     OUTPUT OPTIONS:

     -o <basename_of_outfiles>
            (outfile:) files to which the reports should be appended (default:
            <basename_of_infile>.alignments.csv and <basename_of_infile>.mismatches.csv).

     -d
            (description:) write a descriptive headline to the report files (default:
            no headline).

# Output files

# - columns in the "alignments" file

ID	Alignment ID = the first part of the name of the multiple fasta file.
ALN_LEN	Length of the aligned sequences including the gaps.
MISM	All differences between the aligned sequences.
SNVS	Single Nucleotide Variations = single-base SNPs.
SNV_B	Single Nucleotide Variation bases.
MNVS	Multiple Nucleotide Variations = multi-base SNPs.
MNV_B	Multiple Nucleotide Variation bases.
S_IND	Single-base InDels.
S_IND_B	Single-base InDel bases.
M_IND	Multi-base InDels.
M_IND_B	Multi-base InDel bases.
P_PERC	Percent polymorphic per aligned bases
STATUS	If the alignment is OK the status is 1, otherwise 0.
ERROR	Error code:
	-42	sequences of the multiple fasta file are not equal in length.
	-605	alignment too short.
	-602	not enough sequences in the alignment.
	-656	fraction of mismatches too high.

# - columns in the "mismatches" files

ID	Alignment ID.
TYPE	Mismatch type (SNV, MNV, InDel, Mixed).
ALN_S	Position of the first base of a polymorphism in the alignment.
PS_CONS	Consensus sequence of the polymorphism represented in ambiguity code.
PS_LEN	Length of the polymorphism.
CONS	Consensus sequence of 100 bp upstream and 100 bp downstream the polymorphism.
MINBORD	Minimal distance of the polymorphism to the start or the end of the alignment.
PSAVGSC	Average score of the polymorphis site.
NGAVGSC	Average score of the neighboring 2x5 bases.
N_COUNT	Number of N's in the consensus sequence.
STATUS	Result of 3 tests:
	test 1: length of the polymorphism <= maximal polymorphism length (= binary 1),
	test 2: distance to the start or the end of the alignment > minimal border
		distance (= binary 10 = decimal 2),
	test 3: Average score of the neighboring 2x5 bases >= cutoff and average score 
		of the polymorphis site >= cutoff (= binary 100 = decimal 4)
	All 3 tests passed successfully accounts for binary 111 = decimal 7.


# Examples

Processing one multiple alignment (e.g. "example.mfa" and "example.qual" in the 
directory "test1" in the downloaded archive):

mmfind -d test1/example.mfa

or 

cd test1
mmfind -d example.mfa

In both cases two files ("example.alignments.csv" and "example.mismatches.csv") are 
written to the directory from where the script is called. The '-d'-option is not
required, but responsible for a descriptive header line in the output files.

Processing of multiple alignments (e.g. files in the directory "test2" in the 
downloaded archive):

mmfind -d -o test2/summary test2"

With the -o option the result of the evaluation is not written to separate files for
each alignment, but summarized in the two files "summary.alignments.csv" and
"summary.mismatches.csv". Only fasta files with the following extension are evaluated:
fa, fasta, fas or mfa.

For a more user friendly alignment format use the -c option. For each multiple fasta
alignment an appropriate clustal-like version will be written.

