legacy mode of KEGG pipeline

144 views
Skip to first unread message

florentinc...@gmail.com

unread,
Nov 28, 2017, 10:40:06 AM11/28/17
to HUMAnN Users
Dear HUMAnN developers,

I am trying to run humann2 with a KEGG database we have in the lab.

First I generated the id mapping:

$humann2_humann1_kegg --ikoc $KEGG_DB/koc --igenels $KEGG_DB/genels --o $KEGG_DB/kegg_idmapping.tsv

Then, I run diamond against the database and convert in tsv format using:

$diamondv09_13 blastx \
-p $NSLOTS \
-d /db/kegg/kegg.reduced.v0913114.dmnd \
-q $MOD \
--sensitive \
--evalue 0.001 \
-a $OUT"/"$NAME

$diamondv09_13 view -a $OUT"/"$NAME".daa" -o $OUT"/"$NAME".tsv"

And finally I import the tsv output into humann2 with KEGG mapping.

$humann2 --input $OUT"/"$NAME".tsv" --id-mapping $KEGG_DB'_kegg_idmapping.tsv' --pathways-database $KEGG_DB'keggc' --output $OUT

I am a bit surprised by the output of :
-> _genefamilies.tsv

# Gene Family Sample1_Abundance-RPKs
UNMAPPED 0.0000000000
K02358 1133.1640827489
K02358|Chromobacterium violaceum 134.3738819829

How come I have a stratified output with taxonomical information?

-> _pathabundance.tsv and _pathcoverage.tsv

# Pathway Sample1_Abundance
UNMAPPED 0.0000000000
UNINTEGRATED 1326.5946000459
UNINTEGRATED|Nitrosomonas europaea 167.0478323434

# Pathway Sample1_Coverage
UNMAPPED 1.0000000000
UNINTEGRATED 1.0000000000
UNINTEGRATED|Acaryochloris marina 1.0000000000
UNINTEGRATED|Accumulibacter phosphatis 1.0000000000
UNINTEGRATED|Acetobacter pasteurianus IFO 3283-01 1.0000000000

I am expecting names of pathways as for the classic humann2 pipeline (PWY-3781: aerobic respiration I (cytochrome c) 6637.8799098877) not taxa...

Where am I wrong in the process? Could you help me with that?

Best,

Flo


Eric Franzosa

unread,
Nov 28, 2017, 6:33:49 PM11/28/17
to humann...@googlegroups.com
Hi Flo,

To your first question, if you examine the results of the kegg_idmapping.tsv file you created, you should see columns relating 1) KEGG gene IDs, 2) KO annotations, and 3) species annotations. HUMAnN2 uses #3 to stratify the KO results by species, but I should note that this is more like using infer_taxonomy to post-process a normal HUMAnN2 run and less like HUMAnN2's normal operation (i.e. there is no taxonomic prescreen happening in the legacy KEGG workflow).

To your second question, do you see pathway-like entries later in the file? UNMAPPED and UNINTEGRATED are always at the top. I think it's the case that the legacy KEGG workflow doesn't attach pathway names by default. You can attach those names using humann2_rename_table.

Thanks,
Eric

Reply all
Reply to author
Forward
0 new messages