HUMAnN2: Choice of UniRef50 versus UniRef90 database

Nick243

unread,

Jun 20, 2016, 10:23:59 AM6/20/16

to HUMAnN Users

Hello,

I am a new to HUMAnN2 and am working to was hoping I might be able to ask what is the rational for using the UniRef50 verus UniReg90 database?

Part of the confusion perhaps stem from the recommendation to use the UniRef90 database on the Huttenhower Lab page (http://huttenhower.sph.harvard.edu/humann2), but the use of the UniRef50 database on the user manual (https://bitbucket.org/biobakery/humann2/src/tip/doc/UserManual.md?fileviewer=file-view-default) and wiki/tutorial pages (https://bitbucket.org/biobakery/biobakery/wiki/humann2).

Are there clear advantages/drawbacks of using the sequences clustered to 90% versus 50% similarity?

Thanks in advance for any assistance in this matter and for developing this fantastic program and tutorials.

Eric Franzosa

unread,

Jun 21, 2016, 12:41:12 PM6/21/16

to humann...@googlegroups.com

Greetings!

Apologies for the confusion - you've caught us in the middle of transitioning from UniRef50 as a default to UniRef90. :-) We'll make sure to get all of the docs in agreement. I also have it on my to-do list to write up a section for the manual with guidance on which protein families / protein database to pick for different applications.

In brief, UniRef90 is a good default choice since it is comprehensive, non-redundant, and more likely to contain isofunctional clusters. UniRef50 clusters can be very broad, so there's a risk that the cluster representative might not reflect the function of the homologous sequence(s) found in your dataset. One situation where UniRef50 might be preferable is when dealing with very poorly characterized microbiomes. In that case, requiring reads to map at 90% identity to UniRef90 might be too stringent, and so mapping at 50% identity to UniRef50 could explain a larger portion of sample reads (albeit at reduced resolution).

Thanks,

Eric

Nick243

unread,

Jun 22, 2016, 9:42:02 AM6/22/16

to HUMAnN Users

Hi Eric,

Thanks so much for the quick and detailed response. This makes a lot of sense and really clarifies the issue for us. Much appreciated.

Reply all

Reply to author

Forward