I am trying to run humann2 with a KEGG database we have in the lab.
First I generated the id mapping:
$humann2_humann1_kegg --ikoc $KEGG_DB/koc --igenels $KEGG_DB/genels --o $KEGG_DB/kegg_idmapping.tsv
Then, I run diamond against the database and convert in tsv format using:
$diamondv09_13 blastx \
-p $NSLOTS \
-d /db/kegg/kegg.reduced.v0913114.dmnd \
-q $MOD \
--sensitive \
--evalue 0.001 \
-a $OUT"/"$NAME
$diamondv09_13 view -a $OUT"/"$NAME".daa" -o $OUT"/"$NAME".tsv"
And finally I import the tsv output into humann2 with KEGG mapping.
$humann2 --input $OUT"/"$NAME".tsv" --id-mapping $KEGG_DB'_kegg_idmapping.tsv' --pathways-database $KEGG_DB'keggc' --output $OUT
I am a bit surprised by the output of :
-> _genefamilies.tsv
# Gene Family Sample1_Abundance-RPKs
UNMAPPED 0.0000000000
K02358 1133.1640827489
K02358|Chromobacterium violaceum 134.3738819829
How come I have a stratified output with taxonomical information?
-> _pathabundance.tsv and _pathcoverage.tsv
# Pathway Sample1_Abundance
UNMAPPED 0.0000000000
UNINTEGRATED 1326.5946000459
UNINTEGRATED|Nitrosomonas europaea 167.0478323434
# Pathway Sample1_Coverage
UNMAPPED 1.0000000000
UNINTEGRATED 1.0000000000
UNINTEGRATED|Acaryochloris marina 1.0000000000
UNINTEGRATED|Accumulibacter phosphatis 1.0000000000
UNINTEGRATED|Acetobacter pasteurianus IFO 3283-01 1.0000000000
I am expecting names of pathways as for the classic humann2 pipeline (PWY-3781: aerobic respiration I (cytochrome c) 6637.8799098877) not taxa...
Where am I wrong in the process? Could you help me with that?
Best,
Flo