how to run pathway analysis for KEGG

1,770 views
Skip to first unread message

yun li

unread,
Jul 21, 2016, 11:46:48 AM7/21/16
to HUMAnN Users
Hi,
I have run humann2 on the default databases. However, now I want to check the pathway abundance of KEGG. How can I do this? I've asked this question before, 
Here is the answer I got from Eric 

"Yes, this is possible. If you use the "regroup_table" script to convert UniRef50 gene family abundance to approximate KO abundance, you can then follow the legacy database instructions for making a KEGG module or KEGG pathway table starting from gene-level input:


Thanks,
Eric
"
But I still don't know the exact steps how to do this.
When I tried to run regroup_table, it says invalid choice: 
$ humann2_regroup_table --input log2GeneFamilyQuantileNormalized.tsv --groups uniref50_ko --output uniref50_ko_known.tsv
usage: humann2_regroup_table [-h] [-i INPUT] [-g {uniref50_ec,uniref50_rxn}]
                             [-c CUSTOM] [-r] [-f {sum,mean}] [-u {Y,N}]
                             [-p {Y,N}] [-o OUTPUT]
humann2_regroup_table: error: argument -g/--groups: invalid choice: 'uniref50_ko' (choose from 'uniref50_ec', 'uniref50_rxn')

When I run the following command, it also report error.
$humann2 --input geneFamilies.tsv --output KEGGRes --id-mapping legacy_kegg_idmapping.tsv --pathways-database humann-0.99/data/keggc
Output files will be written to: /home/myfolder/Project1/12.gene2regroup/KEGGRes

Process the sam mapping results ...
Traceback (most recent call last):
  File "//home/myfolder/.local/bin/humann2", line 9, in <module>
    load_entry_point('humann2==0.6.2', 'console_scripts', 'humann2')()
  File "/home/myfolder/.local/lib/python2.7/site-packages/humann2-0.6.2-py2.7.egg/humann2/humann2.py", line 906, in main
    args.input, alignments, unaligned_reads_store, keep_sam=True)
  File "/home/myfolder/.local/lib/python2.7/site-packages/humann2-0.6.2-py2.7.egg/humann2/search/nucleotide.py", line 251, in unaligned_reads
    if int(info[config.sam_flag_index]) & config.sam_unmapped_flag != 0:
ValueError: invalid literal for int() with base 10: 'A468xx000-000'




Thanks,
Yun

Eric Franzosa

unread,
Jul 21, 2016, 12:18:09 PM7/21/16
to humann...@googlegroups.com
Hi Yun,

We've started only bundling HUMAnN2 with a couple of default mapping files, as the full set was getting rather large. You can download the UniRef50-KEGG mapping here:


You will pass this mapping to regroup_table as a custom mapping using the "-c" flag.

====

After doing this, you'll have KO abundance, which actually makes the legacy KEGG pathway quantification fairly simple:

humann2 --input KO.tsv --output $OUTPUT_DIR --pathways-database humann1/data/keggc

The full legacy instructions assume you want to completely replicate a HUMAnN1 run (i.e. starting from raw reads). You're able to skip some of this since you already have UniRef50 abundance (and can regroup to get KOs).

Thanks,
Eric


yun li

unread,
Jul 21, 2016, 1:42:45 PM7/21/16
to HUMAnN Users
Hi Eric,
should I use map_ko_uniref90.txt.gz in this case, and not use -g flag? and  "KO.tsv" in the second command is the output from humann2_regroup_table?  

yun li

unread,
Jul 21, 2016, 2:03:30 PM7/21/16
to HUMAnN Users
Hi Eric,
I runned humann2_regroup_table  using  map_ko_uniref50.txt. Is this right?
Then I run KEGG pathway analysis, However, I still get error from the second step.

$humann2_regroup_table --input log2GeneFamilyQuantileNormalized.tsv -c  map_ko_uniref50.txt --output uniref50_ko_known.tsv

$humann2 --input uniref50_ko_known.tsv --output KEGGRes --pathways-database humann-0.99/data/keggc
Process the sam mapping results ...
Traceback (most recent call last):
  File "//home/myfolder/.local/bin/humann2", line 9, in <module>
    load_entry_point('humann2==0.6.2', 'console_scripts', 'humann2')()
  File "/home/myfolder/.local/lib/python2.7/site-packages/humann2-0.6.2-py2.7.egg/humann2/humann2.py", line 906, in main
    args.input, alignments, unaligned_reads_store, keep_sam=True)
  File "/home/myfolder/.local/lib/python2.7/site-packages/humann2-0.6.2-py2.7.egg/humann2/search/nucleotide.py", line 251, in unaligned_reads
    if int(info[config.sam_flag_index]) & config.sam_unmapped_flag != 0:
ValueError: invalid literal for int() with base 10:' A468xx000-000''

yun li

unread,
Jul 22, 2016, 11:14:08 AM7/22/16
to HUMAnN Users
Hi, I feel really confused about how to rerun the KEGG abudance. When I added flag "--input-format genetable", the second command "humann2 --input uniref50_ko_known.tsv --output KEGGRes --pathways-database humann-0.99/data/keggc" worked. however, the output is really confusing. my input "uniref50_ko_known.tsv" is a tab separated file with each run a ko number and each column a sample, this file is generated from humann2_regroup_table. But the output pathabundance have only one column, any idea what happened?

Thanks,
Yun

Eric Franzosa

unread,
Jul 22, 2016, 11:29:38 AM7/22/16
to humann...@googlegroups.com
Hi Yun,

HUMAnN2 proper only thinks about processing one sample at a time, so the fact that you're providing a multi-sample TSV explains both 1) why the format wasn't immediately detected and 2) why the output contains fewer columns that you expected.

If you want to reprocess a merged table through HUMAnN2, you need to split up the columns, process them separately, and then recombine them. You can see this procedure spelled out in the context of a PICRUSt application here:


Thanks,
Eric


yun li

unread,
Jul 22, 2016, 3:57:07 PM7/22/16
to HUMAnN Users
I was able to run the program if I provide one sample at a time. However, what actually is the pathabundance value from the generated file. 
for example: ko05340 has  K10887 K10603 K10989 K10988.... proteins. The pathabundance value for ko05340 is not the sum nor the mean of the proteins. How ko05340 value was calculated?

Thanks,
Yun

Eric Franzosa

unread,
Jul 22, 2016, 5:12:16 PM7/22/16
to humann...@googlegroups.com
Please see the subsection "Pathway/module coverage and abundance" from the HUMAnN1 paper:


These methods have remained largely unchanged in HUMAnN2.

Thanks,
Eric


Reply all
Reply to author
Forward
0 new messages