Suppose you want to align three strains auf S. aureus. You follow instructions a) in Section 3 and build an index named 3staph:
mkvtree -dna -lcp -suf -tis -db NC_002745.fna NC_002758.fna MRSA.dbs\ -indexname 3staph
You are now able to run mga and create your alignments. You decide to turn on verbose output and to always recurse into gaps. Using these two options is always a good idea for multiple sequences, though in this example, the second one does not make a difference. The output will be written to files whose names start with example.
mga -v -l 1000 20 -always -o example 3staph
The file example.summary contains the command line parameters of mga. Also, the description of the sequences as present in their FASTA headers is given. Then, some simple statistics follow: more than 90% of the sequences are aligned.
The program call arguments were
-v -l 1000 20 -always -o example 3staph
Sequence description:
Seq 1: gi|15925705|ref|NC_002745.1| Staphylococcus aureus subsp. aureus N315, \
complete genome
Seq 2: gi|15922990|ref|NC_002758.1| Staphylococcus aureus strain Mu50, \
complete genome
Seq 3: Staphylococcus aureus (EMRSA-16) chromosome 2,902,619 bp
Number of matches / aligned / unaligned gaps is 23054 / 22960 / 93
length / all al. = matches + aligned / unaligned gaps
Seq 1: 2813641 / 2654636 = 2348504 + 306132 / 159005
Seq 2: 2878040 / 2652323 = 2348504 + 303819 / 225717
Seq 3: 2902619 / 2653076 = 2348504 + 304572 / 249543
Avg. : 2864766 / 2653345 = 2348504 + 304841 / 211421
Coverage in percent
Seq 1: 2813641 / 94.3 = 83.5 + 10.9 / 5.7
Seq 2: 2878040 / 92.2 = 81.6 + 10.6 / 7.8
Seq 3: 2902619 / 91.4 = 80.9 + 10.5 / 8.6
Avg. : 2864766 / 92.6 = 82.0 + 10.6 / 7.4
Please cite the following paper:
Michael Hoehl, Stefan Kurtz, Enno Ohlebusch
Efficient Multiple Genome Alignment
Bioinformatics, Vol. 18 (S1):S312-S320, 2002
If mga is run as above the file example.align contains only the structure of the alignment, not the (aligned) sequence data itself. You can see a multiMEM in the first line, starting at position zero in all three sequences, then a gap which is five bases short, etc.:
61 0 0 0 5:61-65 5:61-65 5:61-65 85 66 66 66 72:151-222 71:151-221 72:151-222 131 223 222 223
When mga is run with the additional parameter -match it outputs the sequence data of matches. Recall that a match consists of bases that are identical in all sequences. Therefore, the bases are shown only once. The last line from the previous example is displayed as follows:
131 223 222 223 Exact: attagaaattacacacaaagttatactatttttagcaacatattcacaggtatttgacat 60 Exact: atagagaactgaaaaagtataattgtgtggataagtcgtccaactcatgattttataagg 120 Exact: atttatttatt 131
When your are only interested in, say, 20 bases next to a gap, you can get this context information by supplying mga with parameter -bl 20 instead:
131 223 222 223
Exact: attagaaattacacacaaag 20
...skipping bases...
Exact: tttataaggatttatttatt 20
You might want to know why the match starts one base earlier in sequence two. The preceding gap is one base shorter in that sequence. You can view the aligned sequence data by calling mga with parameter -clustalw. This outputs the ClustalW alignment of short gaps:
72:151-222 71:151-221 72:151-222 Seq 1: tcactaacagatattctatagaaggaaaagttatccacttatgcacatttatagttttca 60 Seq 2: ...............................atg.....agca..-g............. Seq 3: a............c.................................c.....c....t. Seq 1: gaattgtggata 72 Seq 2: ..........ct Seq 3: ............