Map KEGG IDs to Pathways to Get Bacterial Contributions to Pathways

1,769 views
Skip to first unread message

Philip Braunstein

unread,
May 8, 2014, 12:09:59 PM5/8/14
to picrus...@googlegroups.com
Hello,

We have run PICRUSt on some 16S data to get the metagenome functions that are represented in each sample. We have also run the metagenome_contributions.py script, and we have KEGG KOs correlated with OTU names. We would like to take these KEGG KOs and turn them into metagenome functions so we can tell which bacteria contribute to which functions.

Is there a script or a table that correlates the KEGG KOs with metagenome functions. We believe PICRUSt uses this information, and we were wondering if it would be possible for us to access this to further our analysis.

Thank you in advance for your help,
Soha and Phil

Jesse Zaneveld

unread,
May 8, 2014, 6:00:05 PM5/8/14
to picrus...@googlegroups.com
Hi Soha and Phil,

categorize_by_function.py will summarize the KEGG functional categories (or COG categories) for your sample as a whole.  If you would like to make a graph or table of the metabolic functions contributed by an OTU of interest specifically, you can filter the other OTUs from your input table (using e.g. QIIME's filter_otus_from_otu_table.py script). 

Each KO maps to zero, one or more pathways, and these pathways are then categorized into KEGG functional groups at different levels of specificity.

From Morgan's previous post to the forum:


If you instead prefer to map your data to KEGG modules, you may want to check out the Huttenhower groups HUMAnN tool: http://huttenhower.sph.harvard.edu/humann . In particular the raw data file that comes with humann has a mapping between K numbers for the KEGG orthology groups to modules (see the modulec and modulep files in /data/ in the HUMAnN download). The data file keggc maps between kegg orthology groups (K numbers) and pathways (starting with ko). I believe one file is a 'straight' mapping that includes everything in the module, and the other preserves information on the conjunctive normal form for the module (e.g. you need one of these three genes + one of any of those two genes to do module Y).

Note that my understanding is that basically all of these tools are drawing on processed KEGG mappings circa 2011, since KEGG subsequently went commercial and charges substantial fees for this information.  So if you do happen to have a KEGG subscription you could in theory refine the results by processing the raw data in e.g. the 'ko' data file into tab delimited format, although in practice it likely wouldn't make too much of a difference.

In terms of how PICRUSt uses these data, the pathway mappings above are added to the precalculated files and embedded in the BIOM prediction output.  So if you open the BIOM file of KEGG predictions in a text editor and search for 'metadata' you will see towards the bottom text that looks like this:

"rows": [{"id": "K00001", "metadata": {"KEGG Pathways": [["Metabolism", "Lipid Metabolism", "Fatty acid metabolism .... etc etc

categorize_by_function.py basically wraps the BIOM packages functionality for collapsing one-to-many metadata hierarchies to output KEGG functional categories based on this metadata.

In any case good luck and hope that helps.

Cheers,
Jesse
 


--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
To post to this group, send email to picrus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Philip Braunstein

unread,
May 22, 2014, 2:45:42 PM5/22/14
to picrus...@googlegroups.com
Hi Jesse,

Thank you for your very helpful response. We used the KEGG Id to pathway map you provided. However, a number (3000 to be exact) of the KEGG IDs were not found in this mapping file. Is there a newer version of this mapping file you could give us?

I've attached a file that is the list of the KEGG we needed that were not in the mapping file.

Thanks in advance for your help on this.

Best wishes,
Phil
unknownKeggIds.txt

Jesse Zaneveld

unread,
May 22, 2014, 2:59:34 PM5/22/14
to picrus...@googlegroups.com, Curtis Huttenhower, Morgan Langille, Daniel McDonald
Hi Phil,

That was the most recent version that I have available.   Curtis, Morgan, or Daniel, do any of you happen to have a more updated version of the KEGG mappings?  Also Morgan any idea why these ids might be missing from the KO mappings used in the precalculated files?   i checked that examples are indeed present in bacteria (at first I thought they were just the Eukaryotic-only KOs that had been filtered out).

Cheers,
Jesse

Jesse Zaneveld

unread,
May 22, 2014, 3:02:12 PM5/22/14
to picrus...@googlegroups.com
Also for completeness I should add that it is always possible to pony up $2000 (or $5000 for redistribution I think) and get the latest version from NPO:

http://www.kegg.jp/kegg/download/

That's always been a little too steep for us, but is a possibility. If you do go that route, it would be interesting to know how much the annotations have actually changed vs. the last free version.

Cheers,
Jesse

Daniel McDonald

unread,
May 22, 2014, 3:02:39 PM5/22/14
to Jesse Zaneveld, picrus...@googlegroups.com, Curtis Huttenhower, Morgan Langille
Not that I'm aware of. KEGG is a closed resource, and we previously got the mappings from a combination of its last public release and IMG (which is now unfortunately also a closed resource).

Philip Braunstein

unread,
May 22, 2014, 3:05:33 PM5/22/14
to picrus...@googlegroups.com, Jesse Zaneveld, Curtis Huttenhower, Morgan Langille
I see. 

Thank you for letting us know!

Best wishes,
Phil

KC

unread,
Jun 17, 2014, 2:13:10 PM6/17/14
to picrus...@googlegroups.com, zane...@gmail.com, chut...@gmail.com, morgan.g....@gmail.com
Hi

I have about 50% of the KO IDs that do not have pathways associated with them too and I was wondering if these are then dropped when we run  categorize_by_function?

Thank you.

best rgds
keng

Zhu Wenhan

unread,
Oct 21, 2015, 5:49:24 PM10/21/15
to picrust-users
Hi, Jesse, 

Can you give me an example on how to use the filter_otus_from_otu_table.py to filter the biom file from picrust? Also, can how can I filter out the hits that are not "metabolism"? I tried and it show me this error: ilter_samples_from_otu_table.py -i /Users/winterlab/Desktop/picrust/picrust_with_metadata/Categorize_by_function_on_data_7.picrustc -o metabolic_only.biom -m /Users/winterlab/Desktop/picrust/picrust_with_metadata/no_fecal_mapSWinter0715_Tungstate_data.txt -s 'KEGG_Pathways:Metabolism' 

Traceback (most recent call last):
  File "/macqiime/anaconda/bin/filter_samples_from_otu_table.py", line 162, in <module>
    main()
  File "/macqiime/anaconda/bin/filter_samples_from_otu_table.py", line 121, in main
    open(mapping_fp, 'U'), valid_states)
  File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/filter.py", line 108, in sample_ids_from_metadata_description
    sample_ids = get_sample_ids(map_data, map_header, valid_states)
  File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/filter.py", line 131, in get_sample_ids
    name_to_col = dict([(s, map_header.index(s)) for s in states])
ValueError: 'KEGG_Pathways' is not in list


Thank you so much!

Soha Hassoun

unread,
Aug 1, 2016, 4:33:14 PM8/1/16
to picrust-users

hi Jesse,
The links for these files are broken.

Is it possible to update these links, please?

Thanks
soha

Eric Littmann

unread,
Dec 23, 2016, 2:36:46 PM12/23/16
to picrust-users

Eric Littmann

unread,
Dec 23, 2016, 2:55:53 PM12/23/16
to picrust-users
I also noticed a 50% drop in mapped pathways... When comparing against the same data mapped with categorize_by_function.py, there was also a 50% difference suggesting that this list is incomplete.. probably due to how the mappings were from last publicly available KEGG and IMG. Still nice to have though!

Thanks,
-Eric
Reply all
Reply to author
Forward
0 new messages