Pathcoverage community vs individual taxa

65 views
Skip to first unread message

gavinm...@gmail.com

unread,
Nov 13, 2017, 11:58:12 AM11/13/17
to HUMAnN Users
Hello,

I've noticed many instances in my HUMAnN2 output of community-level pathway coverages equal to 0 despite non-zero pathway coverage in individual taxa.

For example:

ARGININE-SYN4-PWY: L-ornithine de novo biosynthesis 0.0000000000
ARGININE-SYN4-PWY: L-ornithine de novo biosynthesis|g__Bacteroides.s__Bacteroides_cellulosilyticus 0.4459688834
ARGININE-SYN4-PWY: L-ornithine de novo biosynthesis|g__Bacteroides.s__Bacteroides_ovatus 0.3808090502
ARGININE-SYN4-PWY: L-ornithine de novo biosynthesis|g__Bacteroides.s__Bacteroides_caccae 0.7575354811
ARGININE-SYN4-PWY: L-ornithine de novo biosynthesis|unclassified 0.0000000000
ARGININE-SYN4-PWY: L-ornithine de novo biosynthesis|g__Bacteroides.s__Bacteroides_fragilis 0.3116254632

I would have expected the community-level coverage to be at least equal to the highest taxon's coverage. Is there something I'm missing here?

This output was generated with humann2 v0.11.1.

Thanks in advance,

Gavin

Eric Franzosa

unread,
Nov 13, 2017, 2:45:55 PM11/13/17
to humann...@googlegroups.com
Hi Gavin,

The result is consistent with HUMAnN's algorithms, but I agree that it is not particularly intuitive. Let me expand...

HUMAnN2's algorithm for pathway coverage estimation was carried over from HUMAnN1. The idea of the algorithm is to divide the distribution of enzyme abundances into confident detection events versus potential false positives, and then ask if each pathway can be explained in the confident part of the distribution. Your finding of 0 community-level coverage for this pathway indicates that its enzymes are rare (low abundance) compared to other enzyme totals in the community.

As with other aspects of HUMAnN2 computation, we apply the coverage algorithm to each stratified set of abundances (individual species + unclassified) in addition to the community totals. The coverage of a pathway within a species then indicates if the pathway's enzymes within that species were confidently detected relative to other enzymes from that species. Your finding of intermediate coverage for this pathway within (e.g.) B. ovatus means that, compared to the distribution of enzyme abundances in B. ovatus, this pathway was reasonably well covered. Conversely, if most B. ovatus enzymes exceeded 5x coverage, but key enzymes from this pathway had 0.5x coverage, then its coverage score would tank.

So coverage scores are always relative to a background, which differs at the whole-community vs. per-species levels. Practically speaking, if a pathway is actually assigned by HUMAnN2 to one or more species, I would be fairly confident that it is present. The community-level pathway coverages matter more for pathways that are only assigned to "unclassified" (similar to HUMAnN1 output) where there is less external support for the pathway's presence.

Thanks,
Eric

Gavin Douglas

unread,
Nov 13, 2017, 2:59:05 PM11/13/17
to humann...@googlegroups.com
That makes sense now - thanks for the detailed response! 


Gavin
Reply all
Reply to author
Forward
0 new messages