CARMA is a software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short reads. In contrast to the traditional 16S-rRNA approach for taxonomical classification, CARMA uses reads that encode for known proteins. By assigning the taxonomic origins to each read, a profile is constructed which characterises the taxonomic composition of the corresponding community.
Further details can be found in the publications under Citation.
Usage of WebCARMA
Enter your e-mail address in order to obtain an upload link. Use the upload link to upload your metagenomic sequences in FASTA-format.
After your data has been processed you will receive a notification e-mail with a link, which you can use to download your results.
In order to avoid computational overload of our compute cluster we restrict the upload of FASTA files to 30 megabytes within 4 weeks in total.
Confidentiality: WebCARMA runs in a UNIX environment. Your e-mail address, your metagenomic data and the results of the analysis with CARMA is stored locally and can be accessed only by us and our system administrators. We will not share the data with other. All your data will be deleted after 6 weeks.
Description of the Input
Please make sure that,
your uploaded file is a valid FASTA formatted file or an archive (zip, gzip or tgz) containing such a file.
your uploaded file is NOT .doc-format !
your FASTA descriptions consist of unique names.
you upload (protein encoding) DNA -- Protein sequences or (16S) RNA sequences are not supported! Eukaryotic sequences consist mainly of non-coding DNA and are likely to be wrongly classified.
WebCARMA does not yet support the taxonomic classification protein sequences, although this functionality is now implemented in CARMA3. We currently work on a new version of WebCARMA which then will support the upload and taxonomic classification of amino acid sequences.
Please also note that CARMA does no quality check on the sequences, e.g. duplicates are not removed. If you upload your assembled metagenomic DNA (contigs), please be aware that you might have produced chimeric sequences. This also might falsify the taxonomic classifications.
Description of the Output
The output of WebCARMA is an archive that contains the following files:
Environmental gene tags (EGTs) are (translated) reads with matches to Pfam protein families.
This file (in FASTA format) contains the EGTs and the information about matching Pfam family, read description plus readingframe information, hmmpfam E-value, List of GO-Ids (Gene Ontology) and the translated read (EGT) itself:
All identified EGTs are phylogenetically classified. Each line in the output is a tab-separated list
of values, giving information about read name (including reading frame, in this example "_1_3"), Pfam family, List of GO-Ids, ncbi_id, taxon and E-value.
The output format of blastx_result.tax differs a bit from the description above: reading frame information is not given and the field for the Pfam family contains, instead of the Pfam family, a string with internal information about the classification.
For further processing of this data we recommend to use the ncbi_ids instead of the prettyprint taxa.
Functional Profile - functional_profile.tsv
This file contains for each GO-Id a line of tab separated values. These values are GO-Id, Go-term, GO-category and number of EGTs that support this GO-Id.
GO:0051287 "NAD or NADH binding (molecular_function)" molecular_function 105
GO:0005737 "cytoplasm (cellular_component)" cellular_component 103
GO:0046168 "glycerol-3-phosphate catabolic process (biological_process)" biological_process 101
Taxonomic Profile - taxonomic_profile.tsv
This file contains for each taxonomic rank and each taxon the number each EGTs that have been assigned to that taxon.
order "Poales" 165
order "Clostridiales" 39
class "Liliopsida" 180
class "Clostridia" 40
Tools
CARMA does not directly create profiles, it just says for each EGT which gene it encodes and from which species (taxon, to be precise) it most likely originates from.
To get a better overview of the metagenomic content of a sample, one needs to have a profile which tells for each species or function how many supporting EGT have been found in the sample. These Perl scripts below create the profiles using "result.egt" and/or "result.taxa". They also are able to produce histograms (requires gnuplot).
WebCARMA provides by default such profiles, but they have been created using certain parameters which you might want to change. The pdfs for example have been created with gnuplot and we have limited the graphics to show at most 40 taxa per taxonomic rank. For the functional profile, for example, it is possible to use only EGTs which have an E-value below a certain threshold.
If you have two different metagenomes, and both analyzed with CARMA, you can create a comparative taxonomic profile. This Perl script we also provide here.
Call these Perl-scripts without parameters to get available options. The scripts use the raw output of
CARMA to produce functional and taxomic profiles in an output format (tab-separated values) which is
easy to import into other software like spreadsheet programs (e.g. Open Office, Excel) or gnuplot. Both
scripts furthermore provide the option to visualize the profiles in postscript format.
It is important to know that getFunctionalProfile.pl needs the carma output files that contain the EGTs
(with GO-Ids AND E-Value), not the taxonomic classification. The script getTaxonomicProfile.pl needs the
taxonomic classification, not the EGTs.
These scripts are independent of the CARMA-package and can be run independently on a laptop/desktop. They only need some ncbi taxonomy files, further explanation can be found in the header of these scripts.
(Perl (http://www.perl.org/), of course, has to be installed too.)
CARMA3 Evaluation Data Sets
List of the 25 bacteria of the simluated metagenomes in MetaSim format: profile.mprf
Here you can find the results of the analysis of a microbial community from an agricultural biogas reactor:
CARMA3: biogashuman gut
CARMA2.1: biogas
Citation
If you use WebCARMA please cite both publications:
W. Gerlach and J. Stoye
Taxonomic classification of metagenomic shotgun sequences with CARMA3
Nucleic Acids Research 2011, 39(14):e91 [paper][bibtex]
W. Gerlach, S. Jünemann, F. Tille, A.Goesmann and J. Stoye
WebCARMA: A Web Application for the Functional and Taxonomic Classification of Unassembled Metagenomic Reads
BMC Bioinformatics 2009, 10:430 [paper][bibtex]
CARMA Source Code
CARMA is licensed under the GNU GPL.
CARMA3
The source files of CARMA3 can be downloaded here.
CARMA2
The source files of CARMA2.1 can be downloaded here.