I'm new using MetaPhlan and i have a few questions about the results.
I notice that the options -t (for type of analysis) and -stat (for normalization)are stated for calculing and normalize the relative abundance of microorganismis. I would like to know how to obtain total abundance, or read counts, without normalization of each taxa.
I'm grateful for the help!!
Cesar
--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thanks for the response
I have an additional question about the parameter --min_cu_len, the default of this is 2000. Does it mean that all markers of a clade, to be considered as a clade, should be at least 2000 bp? If I a choose a lesser value, i.e 1000, it will produce the double of clades? Which considerations should I take for this parameter?
Kind regards,
Cesar
If the relative abundance of certain taxa is near to 0, Does Metaphlan report it? My question is about the less abundant taxa that their relative abundances are almost 0.
One last question.
If the relative abundance of certain taxa is near to 0, Does Metaphlan report it? My question is about the less abundant taxa that their relative abundances are almost 0.
I'm facing this problem as well. With amplicon data, it is simple enough to model the observed reads per amplicon sequence variant as multinomial (conditional on the total number of reads). It seems to me that an equivalent model for metaphlan's species abundance estimates would be as follows. If x_i is the frequency of species i and l_i is the total length of markers for species i, then x_i*l_i gives the relative probability of reads mapping to species i. We could then model the number of reads per species as multinomial with a probabilities x_i*l_i / (sum_j x_j*l_j) and a total observed read count equal to the sum of total reads mapped to all species markers. We could then use the observed read counts per species and the marker lengths l_i to get estimate uncertainty in estimated abundances of low-frequency species. I'm not sure what the effect of the threshold of 2000 nt for calling a species would be on this model, though.