Hi Soha and Phil,
categorize_by_function.py will summarize the KEGG functional categories (or COG categories) for your sample as a whole. If you would like to make a graph or table of the metabolic functions contributed by an OTU of interest specifically, you can filter the other OTUs from your input table (using e.g. QIIME's filter_otus_from_otu_table.py script).
Each KO maps to zero, one or more pathways, and these pathways are then categorized into KEGG functional groups at different levels of specificity.
From Morgan's previous post to the forum:
Here is the ko to pathway map in two different formats:
Also here is the map from ko to their descriptions:
If you instead prefer to map your data to KEGG modules, you may want to check out the Huttenhower groups HUMAnN tool:
http://huttenhower.sph.harvard.edu/humann . In particular the raw data file that comes with humann has a mapping between K numbers for the KEGG orthology groups to modules (see the modulec and modulep files in /data/ in the HUMAnN download). The data file keggc maps between kegg orthology groups (K numbers) and pathways (starting with ko). I believe one file is a 'straight' mapping that includes everything in the module, and the other preserves information on the conjunctive normal form for the module (e.g. you need one of these three genes + one of any of those two genes to do module Y).
Note that my understanding is that basically all of these tools are drawing on processed KEGG mappings circa 2011, since KEGG subsequently went commercial and charges substantial fees for this information. So if you do happen to have a KEGG subscription you could in theory refine the results by processing the raw data in e.g. the 'ko' data file into tab delimited format, although in practice it likely wouldn't make too much of a difference.
In terms of how PICRUSt uses these data, the pathway mappings above are added to the precalculated files and embedded in the BIOM prediction output. So if you open the BIOM file of KEGG predictions in a text editor and search for 'metadata' you will see towards the bottom text that looks like this:
"rows": [{"id": "K00001", "metadata": {"KEGG Pathways": [["Metabolism", "Lipid Metabolism", "Fatty acid metabolism .... etc etc
categorize_by_function.py basically wraps the BIOM packages functionality for collapsing one-to-many metadata hierarchies to output KEGG functional categories based on this metadata.
In any case good luck and hope that helps.
Cheers,
Jesse