CTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> WebCarma 1.0 : Manual
TIP Here you can find more information about WebCARMA.

Manual

Overview



What is CARMA?

CARMA is a software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short reads. In contrast to the traditional 16S-rRNA approach for taxonomical classification, CARMA uses reads that encode for known proteins. By assigning the taxonomic origins to each read, a profile is constructed which characterises the taxonomic composition of the corresponding community.

Further details can be found in the publications under Citation.

Usage of WebCARMA

Enter your e-mail address in order to obtain an upload link. Use the upload link to upload your metagenomic sequences in FASTA-format.
After your data has been processed you will receive a notification e-mail with a link, which you can use to download your results.

In order to avoid computational overload of our compute cluster we restrict the upload of FASTA files to 30 megabytes within 4 weeks in total.

Confidentiality: WebCARMA runs in a UNIX environment. Your e-mail address, your metagenomic data and the results of the analysis with CARMA is stored locally and can be accessed only by us and our system administrators. We will not share the data with other. All your data will be deleted after 6 weeks.


Description of the Input

Please make sure that,

WebCARMA does not yet support the taxonomic classification protein sequences, although this functionality is now implemented in CARMA3. We currently work on a new version of WebCARMA which then will support the upload and taxonomic classification of amino acid sequences.

Please also note that CARMA does no quality check on the sequences, e.g. duplicates are not removed. If you upload your assembled metagenomic DNA (contigs), please be aware that you might have produced chimeric sequences. This also might falsify the taxonomic classifications.


Description of the Output

The output of WebCARMA is an archive that contains the following files:
result.egt
blastx_result.tax
hmm_result.tax
functional_profile.tsv
taxonomic_profile.tsv
functional_profile.pdf
superkingdom.pdf, phylum.pdf, class.pdf, order.pdf, family.pdf, genus.pdf, species.pdf

EGTs - result.egt

Environmental gene tags (EGTs) are (translated) reads with matches to Pfam protein families. This file (in FASTA format) contains the EGTs and the information about matching Pfam family, read description plus readingframe information, hmmpfam E-value, List of GO-Ids (Gene Ontology) and the translated read (EGT) itself:
>PF04961.4=+=HWI-EAS217_1_2013P:1:1:383:736/1_1_3=+=3.2e-07=+={GO:0044237,GO:0003824}
LPKKTDEEKAARKAAI

Taxonomic Classifications - blastx_result.tax / hmm_result.tax

All identified EGTs are phylogenetically classified. Each line in the output is a tab-separated list of values, giving information about read name (including reading frame, in this example "_1_3"), Pfam family, List of GO-Ids, ncbi_id, taxon and E-value.
072343_1987_0335_1_3	PF01312	{GO:0016020,GO:0009306}	68295	Bacteria(superkingdom)!Firmicutes(phylum)!Clostridia...	7e-30
The output format of blastx_result.tax differs a bit from the description above: reading frame information is not given and the field for the Pfam family contains, instead of the Pfam family, a string with internal information about the classification.
For further processing of this data we recommend to use the ncbi_ids instead of the prettyprint taxa.

Functional Profile - functional_profile.tsv

This file contains for each GO-Id a line of tab separated values. These values are GO-Id, Go-term, GO-category and number of EGTs that support this GO-Id.
GO:0051287 "NAD or NADH binding (molecular_function)" molecular_function 105
GO:0005737 "cytoplasm (cellular_component)" cellular_component 103
GO:0046168 "glycerol-3-phosphate catabolic process (biological_process)" biological_process 101

Taxonomic Profile - taxonomic_profile.tsv

This file contains for each taxonomic rank and each taxon the number each EGTs that have been assigned to that taxon.
order   "Poales"        165
order   "Clostridiales" 39
class   "Liliopsida"    180
class   "Clostridia"    40


Tools

CARMA does not directly create profiles, it just says for each EGT which gene it encodes and from which species (taxon, to be precise) it most likely originates from.
To get a better overview of the metagenomic content of a sample, one needs to have a profile which tells for each species or function how many supporting EGT have been found in the sample. These Perl scripts below create the profiles using "result.egt" and/or "result.taxa". They also are able to produce histograms (requires gnuplot).

WebCARMA provides by default such profiles, but they have been created using certain parameters which you might want to change. The pdfs for example have been created with gnuplot and we have limited the graphics to show at most 40 taxa per taxonomic rank. For the functional profile, for example, it is possible to use only EGTs which have an E-value below a certain threshold.

If you have two different metagenomes, and both analyzed with CARMA, you can create a comparative taxonomic profile. This Perl script we also provide here.

getFunctionalProfile.pl
In addition requires: Gene Ontology

getTaxonomicProfile.pl
In addition requires: NCBI Taxonomy

getComparativeTaxonomicProfile.pl
In addition requires: NCBI Taxonomy

Call these Perl-scripts without parameters to get available options. The scripts use the raw output of CARMA to produce functional and taxomic profiles in an output format (tab-separated values) which is easy to import into other software like spreadsheet programs (e.g. Open Office, Excel) or gnuplot. Both scripts furthermore provide the option to visualize the profiles in postscript format.

It is important to know that getFunctionalProfile.pl needs the carma output files that contain the EGTs (with GO-Ids AND E-Value), not the taxonomic classification. The script getTaxonomicProfile.pl needs the taxonomic classification, not the EGTs.

These scripts are independent of the CARMA-package and can be run independently on a laptop/desktop. They only need some ncbi taxonomy files, further explanation can be found in the header of these scripts.
(Perl (http://www.perl.org/), of course, has to be installed too.)


CARMA3 Evaluation Data Sets

List of the 25 bacteria of the simluated metagenomes in MetaSim format: profile.mprf

The simulated metagenomes of the CARMA3 evaluation:
simulated_metagenome_454_400bp.fna
simulated_metagenome_454_265bp.fna
simulated_metagenome_454_80bp.fna
simulated_metagenome_Illumina_80bp.fna

Example Results

Here you can find the results of the analysis of a microbial community from an agricultural biogas reactor:
CARMA3: biogas human gut
CARMA2.1: biogas

Citation


If you use WebCARMA please cite both publications:

W. Gerlach and J. Stoye
Taxonomic classification of metagenomic shotgun sequences with CARMA3
Nucleic Acids Research 2011, 39(14):e91 [paper] [bibtex]

W. Gerlach, S. Jünemann, F. Tille, A.Goesmann and J. Stoye
WebCARMA: A Web Application for the Functional and Taxonomic Classification of Unassembled Metagenomic Reads
BMC Bioinformatics 2009, 10:430 [paper] [bibtex]


CARMA Source Code

CARMA is licensed under the GNU GPL.

CARMA3

The source files of CARMA3 can be downloaded here.

CARMA2

The source files of CARMA2.1 can be downloaded here.

CARMA1

The original version CARMA1.2 (L. Krause) can be found under http://www.cebitec.uni-bielefeld.de/brf/carma/carma.html.

Contact

If you have any questions, please contact webcarmaAcebitec.uni-bielefeld.de. (Important, please replace the "A" with "@"!)