Choosing between the KEGG database and Uniref90 Conversion, and how to implement the KEGG database.

507 views

Skip to first unread message

justinw...@gmail.com

unread,

Feb 22, 2018, 6:49:50 PM2/22/18

to HUMAnN Users

Hello HUMAnN users,

I just had a quick methods based question I wanted to share with the group.

Currently, I utilize the standard HUMAnN2 workflow for meta genome and metatranscriptome processing. To briefly summarize our pipeline, we quality filter/pair with trimmomatic, remove host reads with KneadData, annotate taxonomy with Metaphlan, and conduct uniref90 annotation within Humann2 (using diamond) and subsequently convert identified Uniref90 terms into KEGG orthlogy terms (KOs) which we then summarize into stratified L1-L3 pathways.

I suppose I have 2 questions...if were were to choose to obtain a KEGG database license and utilize that (and thus bypass the UniRef90 to KO conversion):

1) What is the method by which humann can be directed to the LICENSED database?

2) What would a database warrant me? I noted that many of the KOS that currently are obtained after uniref90 conversion are labeled "No_Name"...but actually do have names when searched online. Would a KEGG database license likely only provide those names...or would it likely elevate the number of successful annotations of my filtered sequence data?

I hope this makes sense and is an okay question to ask the forum, just looking for some guidance here.

Thank you,
Justin Wright

Eric Franzosa

unread,

Feb 23, 2018, 1:51:54 PM2/23/18

to humann...@googlegroups.com

Hi Justin,

There are instructions for working with HUMAnN1's legacy KEGG databases here:

https://bitbucket.org/biobakery/humann2/wiki/Home#markdown-header-legacy-databases

I think you could probably adapt these to work with an updated KEGG. The advantages of doing that would be more up-to-date KEGG-based coding sequences, KO definitions, and pathway/module definitions. The disadvantage would be not taking full advantage of the tiered (species-stratified) search in HUMAnN2 (the legacy operations described above emulate HUMAnN1 within HUMAnN2).

An intermediate option would be to use an updated KEGG to get better KO descriptions and new pathway/module definitions but rely on HUMAnN2's species stratified KO abundances as a basis for pathway/module quantification. (You can run HUMAnN2 with KO abundance as input and supply a KEGG-formatted pathway/module definition to --pathways-database.)

Thanks,

Eric

Reply all

Reply to author

Forward

0 new messages