Stratified bugs abundance don't add up to the same value as the path abundance?

220 views
Skip to first unread message

Wennie

unread,
Mar 14, 2016, 2:33:21 PM3/14/16
to HUMAnN Users
Dear humann2 developers and users,

As I am investigating output by humann2, I noticed that for those stratified bugs listed under individual pathways, their abundances' sum is not equal to the abundance of that pathway. The sum is always larger than the pathway abundance. Is this expected? If so, why?

I pasted some example here:
GO: calcium receptor: 0.25   0.3
GO: calcium receptor|bugA  0.2   0.18
GO: calcium receptor|bugB   0.06   0.15


Thanks for advice,

Wennie

Eric Franzosa

unread,
Mar 14, 2016, 2:43:02 PM3/14/16
to humann...@googlegroups.com
Hi Wennie,

For structures like MetaCyc pathways, this is indeed expected. There is some further explanation provided here:


Your example showed GO terms however - may I ask how you're quantifying these? If you're using HUMAnN2's built-in regroup function for GO terms, the strata SHOULD sum to the community total (since regrouping doesn't feature any of the more complex "complete module copies" math that leads to non-linearity in the pathway sums).

Thanks,
Eric


Wennie

unread,
Mar 14, 2016, 4:52:56 PM3/14/16
to HUMAnN Users
Hi Eric,

Thanks for your prompt advice. It is very helpful to know details of the "complete module copies" math model.

For MetaCyc pathways, it seems that the "complete module copies" math will make the community pathway abundance greater, if not equal to, the sum of its stratified abundance values. However, it is NOT what I saw. My observation is actually the other way around, that the community pathway abidance is smaller than its stratified abundance values. Do you think it is expected as well?

For GO terms, I first do humann2_regroup_table with uniref50_go option, and then humann2_renorm_table with -n relab flag. The output will then result observations I mentioned. Do you see anything that I might not do correctly?

Thanks for your advice,

Wennie

Eric Franzosa

unread,
Mar 15, 2016, 12:32:30 PM3/15/16
to humann...@googlegroups.com
Hi Wennie,

Both of these observations are surprising to me. Can you share your genefamilies.tsv file with me (feel free to send it to me individually rather than posting to the group)? Also, can you clarify which version of HUMAnN2 you are running?

Thanks,
Eric


wen...@stanford.edu

unread,
Mar 15, 2016, 4:10:30 PM3/15/16
to HUMAnN Users
Hi Eric,

I'd love to get your help sorting out the surprising results. I wanted to share the genefamilies.tsv file, but it is rather large (~150MB). I don't want to randomly trim the file down to a smaller proportion, as it affect its behavior when merging into GO/KO terms downstream. Would you please instruct how I could send such a bunky file to you for further inspection?

Thanks!

Wennie

Eric Franzosa

unread,
Mar 15, 2016, 6:39:09 PM3/15/16
to humann...@googlegroups.com
Hi Wennie,

I see what is happening here. As of HUMAnN2 v0.6 we now report the # of unmapped reads in the gene families file as an unstratified feature ("UNMAPPED"). When you renorm'ed your table, the unmapped count was included when normalizing the community totals but not the individual stratifications (breaking the "stratifications sum up to community totals" relationship that normally holds for gene families). As a short-term fix, you can either 1) remove the "UNMAPPED" feature from your gene families file before renorm'ing or 2) add another feature below UNMAPPED called "UNMAPPED|unclassified" with the same value. I.e. for option #2 your file would look like...

# Gene Family   6902701_Abundance
UNMAPPED        123305.0000000000
UNMAPPED|unclassified   123305.0000000000
UniRef50_A0A008BVE2     0.9003113322
UniRef50_A0A008BVE2|unclassified        0.9003113322
UniRef50_A0A008BXX0: ThiW protein       0
UniRef50_A0A008BXX0: ThiW protein|unclassified  0
...

Thanks for bringing this to our attention, as it's an issue others are likely to encounter with the latest version of HUMAnN2. In the next update we'll change either 1) the UNMAPPED reporting or 2) the regroup_table script itself so this is handled automatically!

Thanks,
Eric
Reply all
Reply to author
Forward
0 new messages