I have one question of the output results from MetaPhlAn.
I am able to run through MetPhlAn with my data. I saw in one posted answer from you as follows:
These clade-specific abundance values are then normalized with respect to all clades at the same taxonomic level (e.g. species sum up to 100) obtaining relative abundance of genome counts (rather than relative abundance of DNA concentrations). Unclassified sub-clades are added when the abundance of a clade is larger than the sum of the abundances of the direct children.
However, in my results, I added up species level in one sample, it is only about 80%, it is the same in other samples. On phylum, class, order, family, genus levels are fine, which add up to 100%. Only species is strange in my results. I just don't understand why did this happen?
Also I am not sure I understand 'Unclassified sub-clades are added when the abundance of a clade is larger than the sum of the abundances of the direct children.' in your answer. Can you give an example?
Thank you so much in advance. I really appreciate your help.
Best
Ning
Thank you for your help.It is quite clearly. As you said, I did the calculation. I summed all 's_' and all higher levels with 'unclassified', which is 100%.
Just another thought. I did some 16S analysis for taxonomy before. I saw a case like this, 'g__Sphingobacteriaceae', 's__Sphingobacteriaceae_unclassified'. Although we can't identify the species in Sphingobacteriaceae on species level, we do know it is Sphingobacteriaceae as 'g__Sphingobacteriaceae', and we say on the species level it is unclassified. In this way, Sphingobacteriaceae also shows on the species level, just unclassified. In MetaPhlAn, Sphingobacteriaceae only shows on genus level as 'g__Sphingobacteriaceae_unclassified', and does not show on species level. Is it right? Can you explain more about why MetaPhlAn does it differently, is there any particular reason?
Thanks again. It really help me to get better understanding of MetaPhlAn.
Best
Ning