Including genus/family-level classifications

81 views
Skip to first unread message

vrmar...@gmail.com

unread,
Dec 5, 2017, 5:16:02 PM12/5/17
to HUMAnN Users

Hi there!

 

HUMAnN2 has been really helpful, thanks for doing this!

 

Is there a way of including bugs that were only classified at the genus/family level during the prescreen in the next steps of the analysis?

 

Here is my problem: the majority of the species detected during the pre-screen are only classified at genus/family level. Because HUMAnN2 only picks the bugs with species-level classification to construct the custom ChocoPhlAn database, the resulting contribution of species to genes/pathways is completely attributed to the few species that got a species-level taxonomic assignment, which I think is misleading. (One log example is attached).

 

One way of circumventing this issue would be to use --bypass-prescreen, but that takes way too long to run (I never managed to finish one sample actually, the analysis seem to be stuck in the bowtie2-build for over 10 hours, with 20 threads and 100GB mem).


Can you recommend any other way of dealing with this?


Thanks a lot!

Best regards,
Vanessa

Bird10_mRNA.log

Eric Franzosa

unread,
Dec 8, 2017, 11:04:09 AM12/8/17
to humann...@googlegroups.com
Hi Vanessa,

HUMAnN2 is not currently designed to automatically use Genus_unclassified abundances from the prescreen in a special way (only known species are included in the pangenome search, and any reads that fail to map there are passed on to the translated search). I can think of a few options that might help you:

1) You can use the infer_taxonomy script to add additional taxonomic information to the "unclassified" gene families. For example, if you run this script at genus-level resolution, it will collapse your detected species' abundances to genus-level, and also infer genera for (e.g.) UniRef90/50s based on their LCA annotations. This is not as precise as the taxonomic profiling from the prescreen, but it works reasonably well in practice for getting an idea of what's hiding in "unclassified."

2) You can give HUMAnN2 a taxonomic profile to work with (via --taxonomic-profile). For example, if you want to profile all known species pangenomes from a given genus, you could make an input taxonomic profile listing all of those species. You could then collapse the results to the genus-level. This would be analogous to profiling against a genus-level pangenome (as opposed to HUMAnN2's default species-level pangenomes). This would be a more focused alternative than mapping against _all_ of ChocoPhlAn (which is what --bypass-prescreen is doing).

Both of those options are expanded in the HUMAnN2 manual.

Thanks,
Eric

vrmar...@gmail.com

unread,
Dec 10, 2017, 10:50:21 PM12/10/17
to HUMAnN Users
Hi Eric,

Thanks a lot, very helpful suggestions!

Best wishes,
Vanessa
Reply all
Reply to author
Forward
0 new messages