Re: Parameter of '-t ANALYSIS TYPE'

Nicola Segata

unread,

Feb 19, 2013, 2:15:46 AM2/19/13

to metaphl...@googlegroups.com

Hi Ning,

that's a good question. The two types of analysis (clade_profiles and marker_ab_table) are quite similar and they mostly differ only for the format.

They both report the number of reads per marker normalized (i.e. divided) by the marker length (in nt) and multiplied by 1,000. As said, the differences are in the output format as detailed below.

In the output obtained with "clade_profiles":

* the markers are binned by clade

* values of 0 are explicitly reported for markers in clades having at least one marker with non-null abundance

* can be useful to easily check the consistency of markers within a clade

In the output obtained with "marker_ab_table":

* only markers with non-null abundance are reported

* clade information is not included

* abundances from multiple samples can be merged with the "utils/merge_metaphlan_tables.py" script.

* the marker abundances can be normalized by the total number of reads in the metagenome (specified with --nreads) to make abundances more consistently comparable across samples

We recently added "marker_ab_table" because of the last two advantages listed above (i.e. comparing the marker abundances across samples), and we left "clade_profiles" mostly for retro-compatibility.

let me know if this solves your doubts

many thanks

Nicola

On Monday, February 18, 2013 10:41:01 PM UTC+1, Ning Li wrote:

Hi,
I am currently using MetaPhlAn, it is a nice tool for taxonomy classification. I understand the relative abundance in the output as '-t rel_ab' [default]. But I am confused by another two paramenters of '-t ANALYSIS TYPE'
They are :
clade_profiles: normalized marker counts for clades with at least a non-null marker
marker_ab_table: normalized marker counts (only when > 0.0 and normalized by metagenome size if --nreads is specified)
My questions are
1. What are the differences of those two, 'normalized marker counts'?
2. What do the values mean in the output file if I choose them? How do you calculate them?
Thank you so much.
Best
Ning

Ning Li

unread,

Feb 19, 2013, 12:08:48 PM2/19/13

to metaphl...@googlegroups.com

Hi Nicola,

Thank you so much. It is quite helpful.

Ning

bluep...@gmail.com

unread,

Mar 4, 2013, 7:07:40 AM3/4/13

to metaphl...@googlegroups.com

Dear Nicola,

Thanks for adding the -t marker_ab_table function which also enable merging of multiple samples + have normalization against metagenome size (in terms of #. of reads)!

We did some clustering based on the default output "rel ab", relative abundance of clades, and wondering, do we need to normalize these rel_ab profiles against metagenome size as well [in a similar fashion as the fucntion in marker abundance table? There were some metagenomes which had sequence depth around 40% of others [lower coverage].

Thank you very much for your time.
Maria

Nicola Segata

unread,

Mar 5, 2013, 4:29:54 AM3/5/13

to metaphl...@googlegroups.com

Hi Maria,

thanks for your question.

No, the taxonomic relative abundances obtained with MetaPhlAn do not need to be normalized by the metagenome dimension. When samples with very different coverage are compared some caution is needed, but a a maximum deviation of 40% is definitely not a problem!

many thanks

Nicola

Message has been deleted

EthanK Gough

unread,

Dec 10, 2013, 5:32:32 PM12/10/13

to metaphl...@googlegroups.com

Hi Nicola

Is there a script available to obtain relative clade abundances from the marker_ab_table output?

Thank you!

Ethan

Nicola Segata

unread,

Dec 12, 2013, 8:22:26 AM12/12/13

to metaphl...@googlegroups.com

Hi Ethan,

the main MetaPhlAn script does that internally. The marker_ab_table output is meant to be an output for other marker-based analyses, not an intermediate output for the main MetaPhlAn pipeline.

I hope this helps,

thanks

Nicola

Reply all

Reply to author

Forward

Message has been deleted