One question of the output result from MetaPhlAn

982 views
Skip to first unread message

Ning Li

unread,
Feb 28, 2013, 10:47:12 PM2/28/13
to metaphl...@googlegroups.com
Hi Nicola,

I have one question of the output results from MetaPhlAn.

I am able to run through MetPhlAn with my data. I saw in one posted answer from you as follows:

These clade-specific abundance values are then normalized with respect to all clades at the same taxonomic level (e.g. species sum up to 100) obtaining relative abundance of genome counts (rather than relative abundance of DNA concentrations). Unclassified sub-clades are added when the abundance of a clade is larger than the sum of the abundances of the direct children.

However, in my results, I added up species level in one sample, it is only about 80%, it is the same in other samples. On phylum, class, order, family, genus levels are fine, which add up to 100%. Only species is strange in my results. I just don't understand why did this happen?

Also I am not sure I understand 'Unclassified sub-clades are added when the abundance of a clade is larger than the sum of the abundances of the direct children.' in your answer. Can you give an example?

Thank you so much in advance. I really appreciate your help.

Best
Ning

Nicola Segata

unread,
Mar 1, 2013, 4:10:53 AM3/1/13
to metaphl...@googlegroups.com
Hi Ning,
 thanks for the inquiry. 

Let me give an example about the unclassified clades. Suppose in your sample you have an unknown species (say the fictitious Staphylococcus italianensis :) which is unknown because no genomes are available for that clade or because the clade is not even taxonomically known. MetaPhlAn would detect the presence of markers for the genus Staphylococcus, but no markers for any of the known Staph species. As a consequence, MetaPhlAn adds a species level unclassified clade called g__Staphylococcus_unclassified.
More specifically, suppose that MetaPhlAn detects the following abundances (restricted to Staph only)
g__Staphylococcs 20%
s__Staphylococcus_aureus 5%
With this situation, MetaPhlAn adds the following clade:
s__Staphylococcs_unclassified 15%
Where s__Staphylococcus_unclassified accounts for the non-directly detectable "Staphylococcus italianensis". 

The reason why you don't see all species summing up to 100% may be because of higher-level unclassified clades (i.e. the leaves of the taxonomic tree). In other words, if you have an unclassified clade at family level (e.g. g__Staphylococcaceae_unclassified) it acts as a species-level clade (because f__Staphylococcaceae_unclassified has no children clades). So, when you sum up the clades for species you should select all "s__*" and all "*_unclassified" in the sum.

Does this answer you questions?

many thanks
Nicola

Ning Li

unread,
Mar 1, 2013, 10:54:30 AM3/1/13
to metaphl...@googlegroups.com
Hi Nicola,

Thank you for your help.It is quite clearly. As you said, I did the calculation. I summed all 's_' and all higher levels with 'unclassified', which is 100%.

Just another thought. I did some 16S analysis for taxonomy before. I saw a case like this, 'g__Sphingobacteriaceae', 's__Sphingobacteriaceae_unclassified'. Although we can't identify the species in Sphingobacteriaceae on species level, we do know it is Sphingobacteriaceae as 'g__Sphingobacteriaceae', and we say on the species level it is unclassified. In this way, Sphingobacteriaceae also shows on the species level, just unclassified. In MetaPhlAn, Sphingobacteriaceae only shows on genus level as 'g__Sphingobacteriaceae_unclassified', and does not show on species level. Is it right? Can you explain more about why MetaPhlAn does it differently, is there any particular reason?

Thanks again. It really help me to get better understanding of MetaPhlAn.

Best
Ning

Nicola Segata

unread,
Mar 3, 2013, 5:32:14 PM3/3/13
to metaphl...@googlegroups.com
Hi Ning,
 yes, your interpretation is correct.

With 16S it is possible to bin reads into OTUs regardless of the taxonomy, and OTUs are roughly considered "species-level clades". Thus, in a 16S dataset all leaf nodes are species-level clades and something like s__Sphingobacteriaceae_unclassified actually makes sense. With shotgun metagenomics there are no direct OTU-like concepts; if reads cannot be assigned more precisely than at family level (e.g. f__Sphingobacteriaceae) we cannot directly know whether it is a specific species or possibly more than one species in, say, two genera, and we bin the whole "unknown" directly below the family (so we estimate the abundance of g__Sphingobacteriaceae_unknown). 

There might be ways of refining these aspects, though... (any suggestion?) 

many thanks
Nicola

Ning Li

unread,
Mar 3, 2013, 5:41:34 PM3/3/13
to metaphl...@googlegroups.com
Hi Nicola,

Thank you so much. It makes sense.I am new to metagenomes. It is hard for me to see the differences between similar concepts.

Thanks again.

Ning
Reply all
Reply to author
Forward
0 new messages