HUMAnN2 and microbiome diversity

Sherman Jia

unread,

May 10, 2016, 1:35:08 PM5/10/16

to HUMAnN Users

Hi all,

Thanks for making this software available. I'm currently using HUMAnN2 v0.7.

I'm using the standard HUMAnN2 pipeline to study gut microbiome bacterial abundance, genes represented, and pathways in a few dozen mice. Despite starting with a large sequence dataset, I've noticed that the number of specific bacterial genus and species is limited (about 40 species detected per sample, 40-70% sequence alignment rate via bowtie2). I suspect that this is due to the taxonomic profiling performed by MetaPhAn2 in the pre-screening process. My questions are:

1. If I'm looking for more accurate bacterial abundance estimates, is it best to use HUMAnN2 with the --bypass-prescreen option, or to use another tool?

2. In your experience, does using --bypass-prescreen significantly change downstream gene / pathway abundance results? If yes, how so. If not, why use the prescreen?

3. Is there a way to extract bacterial abundance from translated alignments (diamond) for bugs not identified in pre-screening?

Best,

Sherman Jia, MD MEng
UCSF MS Center

Eric Franzosa

unread,

May 10, 2016, 3:52:29 PM5/10/16

to humann...@googlegroups.com

Hi Sherman,

The numbers you're seeing for alignment to detected species' pangenomes are right in line with what we see for human gut metagenomes (50-60% alignment). Notably, when unaligned reads are then subjected to comprehensive translated search, we tend to explain about 15% more reads. Thus, for well-characterized microbial communities, we are explaining the majority of what _can_ be explained in the pangenome stage.

In answer to your specific questions:

Re: 1 & 2. The prescreen with MetaPhlAn2 helps us to identify the species present in a community and thus focus downstream search on those species' pangenomes. In addition to being faster than a comprehensive pangenome search, this approach will be less prone to spurious mapping to pangenomes of non-present species. Conversely, bypassing the prescreen forces HUMAnN2 to search against the comprehensive pangenome database. This approach will be slower and more prone to spurious mapping. It is included as an option for researchers who strongly suspect that a species is present in their samples despite not being detected by MetaPhlAn2. Most users will not need this option.

Re: 3. In theory is _is_ possible to make taxonomic inferences about the translated mapping results (each UniRef centroid is associated with the LCA of the species contributing to the corresponding protein family). This isn't currently supported in HUMAnN2, but it's something we're thinking about. Notably, this inference would probably need to occur at a lower level of taxonomic resolution than species (perhaps family?).

Thanks,

Eric

Sherman Jia

unread,

May 10, 2016, 5:39:46 PM5/10/16

to HUMAnN Users

Hi Eric,

Thanks for your thoughtful and prompt reply!

What is the average number of unique genus and species you see in your human gut metagenomes? Each sample in our mouse gut metagenome dataset has about 60 million reads of length 100bp each, and we only detect about 25 unique bacterial genus and 40 unique species (which seems quite low). Similar to in your human gut metagenomes, 50-70% of reads align to the pre-identified bugs, and goes up to 80-85% after translated alignment.

Is there another tool or set of tools you recommend for obtaining the bacterial composition from this dataset with higher sensitivity?

Best,

Sherman

Eric Franzosa

unread,

May 10, 2016, 9:06:14 PM5/10/16

to humann...@googlegroups.com

We detect about 50 species in a typical healthy, adult, human gut, so not far off from what you're getting. Personally I would not classify that as as small number for a microbiome. We are of course limited to species with 1) characterized genomes and 2) reasonable coverage in the metagenome (for HUMAnN2 the default is 1 part in 10,000).