How to understand the " --stat [tavg_g] "

140 views
Skip to first unread message

Yunlong JIA

unread,
Sep 28, 2016, 11:03:00 PM9/28/16
to MetaPhlAn-users
As the description of the guide:

$ the "--stat tavg_g" is :
<truncated clade global average at "--stat_q quantile”>

## Here "--stat_q = 0.1 as dedault"

$ It seems that, in my opinion, the elements in same clade will be ranged from largest to smallest concerning each relative abundance.

If the --stat_q = 0.1, the 10 percentile will be used as a threshold value, then we have "clade global average minus 10 percentile" as tavg_g.

$ But, one strange thing is when I change the --stat_q to 0, my taxonomic aundance also changed as following description:
# --stat_q 0.1 as default:
k__Viruses 81.98395
k__Bacteria 17.87901
k__Eukaryota 0.13703
# --stat_q 0 :
k__Archaea 78.33023
k__Viruses 14.73921
k__Bacteria 4.42914
k__Eukaryota 2.50142
$ As shown above, the results are so different.. why so high proportion for Archaea? I don't understand..

$ Any comment is appropriated, tell me what you think about and any my superficial mistake. Thanks!

Duy Tin Truong

unread,
Oct 7, 2016, 6:41:35 AM10/7/16
to Yunlong JIA, MetaPhlAn-users
Hi Yunlong,

The abundance of a clade is computed as the average of marker abundances in the ascending order from stat_q * 100 percentile to 100 - stat_q * 100 percentile. If you set stat_q to 0, everything will be considered therefore you saw very different results.

Cheers,
Tin

--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yunlong JIA

unread,
Oct 11, 2016, 11:51:44 PM10/11/16
to MetaPhlAn-users, jerome.yu...@gmail.com
在 2016年10月7日星期五 UTC+9下午7:41:35,Duy Tin Truong写道:
Thanks for your reply. Anyway, only 10 percentile changes, how does it makes k__Archaea proportion so different, from 0 to 78~ ? Perhaps, some other normalization methodology be used ?

Duy Tin Truong

unread,
Oct 13, 2016, 4:26:02 AM10/13/16
to Yunlong JIA, MetaPhlAn-users
Hi Yunlong,

That can come from the fact that many k__Archaea species have very low rate of marker hits (<10%), and therefore, when you change the threshold to 0, metaphlan will consider all of them as present species. We used the threshold of 10% to make the result more robust, or to avoid many false positive results.

Cheers,
Tin
Reply all
Reply to author
Forward
0 new messages