--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Hi Dan and Jesse,
Thanks for the reply. I've been looking into KEGG alternatives, but so far haven't found anything that would be a perfect substitute (in this vein I'm open to any suggestions on databases).
So far I’ve considered metacyc, cog, unipathway, wikipathway and GO. Metacyc is a database of experimentally-verified pathways, but the genes and pathways are organism-specific. I could try combining the species-specific pathways (i.e. combine species-specific pathways by merging genes between species if they are reciprocal best blast hits) but that could present problems as I try to merge multiple species. COG (clusters of orthologous groups) provides organism-independent gene classifiers, but the classifiers are only organized into broad functionalities. Wikipathway seems to be an attempt to replicate the datat in KEGG and supplement this with additional data, but it seems less informative than KEGG. Unipathway provides levels of pathway organization similar to KEGG but many (most?) of the Uniprot identifiers are not linked to any unipathway metabolic pathways. Finally, the GO Process database is similar to KEGG, so I could potentially create a ‘GOprocess x CopyNumber’ table for the GO annotated genomes in NCBI, build an augmented tree, and then look for differences in GO processes between treatments. However, a great majority (~95%) of GO annotations derive from automatic transfer of InterProScan results so I don’t know how reliable that sort of table would be. Despite this, the GOprocess seems like the best substitute for KEGG pathways.
I think the easiest path might be to substitute the KO identifiers for GO Molecular Function identifiers via http://www.genome.jp/files/ko2go.xl, (thanks for that suggestion Jesse). However, I don’t know how informative GO enrichment of Molecular Function will be for comparing two bacterial communities (a previous experience using GO left me with the impression that you could probably find a significant hypergeometric enrichment for any random list of genes even after Bonferroni correction!). Have you tried looking at GO enrichment with 16S data in the past?
Alternatively, I could build a trait table for the IMG genomes (I’m guessing this is the address: ftp://ftp.jgi-psf.org/pub/IMG/img_core_v400/) using pfam domains instead of KO IDs. Then I could look for differences in functional annotations between treatments, but this might be misleading since different proteins can share the same pfam annotation.
Do you have any thoughts or suggestions about these options?
Thanks again for the help.
Best,
Aaron