Hi all,
Is there a way to take the eggNOG output from Trinotate and get the corresponding functional categories (
http://eggnog.embl.de/version_4.0.beta/data/downloads/eggnogv4.funccats.txt)?
INFORMATION STORAGE AND PROCESSING
[J] Translation, ribosomal structure and biogenesis
[A] RNA processing and modification
[K] Transcription
[L] Replication, recombination and repair
[B] Chromatin structure and dynamics
CELLULAR PROCESSES AND SIGNALING
[D] Cell cycle control, cell division, chromosome partitioning
[Y] Nuclear structure
[V] Defense mechanisms
[T] Signal transduction mechanisms
[M] Cell wall/membrane/envelope biogenesis
[N] Cell motility
[Z] Cytoskeleton
[W] Extracellular structures
[U] Intracellular trafficking, secretion, and vesicular transport
[O] Posttranslational modification, protein turnover, chaperones
METABOLISM
[C] Energy production and conversion
[G] Carbohydrate transport and metabolism
[E] Amino acid transport and metabolism
[F] Nucleotide transport and metabolism
[H] Coenzyme transport and metabolism
[I] Lipid transport and metabolism
[P] Inorganic ion transport and metabolism
[Q] Secondary metabolites biosynthesis, transport and catabolism
POORLY CHARACTERIZED
[R] General function prediction only
[S] Function unknown
I have the eggNOG and COG IDs from Trinotate in a list like this:
ENOG4112CG2
ENOG4112CEJ
ENOG4112CDX
ENOG4112CDU
COG0258
COG0256
COG0249
I was hoping there might be a tool or script someone knows of that can take a list of eggNOG and COG identifiers and map them to their corresponding functional category (A-Z). From there I could use simple awk commands to get counts and then use R to graph them like this:

(
http://i.stack.imgur.com/36MKI.png)
Alternatively, I have experimented with a KOG annotation server (
http://weizhong-lab.ucsd.edu/metagenomic-analysis/server/kog/), which gives output like such:
#KOG class count description
A 2924 RNA processing and modification
B 869 Chromatin structure and dynamics
C 1985 Energy production and conversion
D 1488 Cell cycle control, cell division, chromosome partitioning
E 2444 Amino acid transport and metabolism
F 602 Nucleotide transport and metabolism
G 3127 Carbohydrate transport and metabolism
H 696 Coenzyme transport and metabolism
I 2873 Lipid transport and metabolism
J 2494 Translation, ribosomal structure and biogenesis
K 3834 Transcription
L 1771 Replication, recombination and repair
M 870 Cell wall/membrane/envelope biogenesis
N 13 Cell motility
O 5940 Posttranslational modification, protein turnover, chaperones
P 1518 Inorganic ion transport and metabolism
Q 2473 Secondary metabolites biosynthesis, transport and catabolism
R 7325 General function prediction only
S 3590 Function unknown
T 9401 Signal transduction mechanisms
U 3169 Intracellular trafficking, secretion, and vesicular transport
V 546 Defense mechanisms
W 254 Extracellular structures
X 2 multiple functions
Y 211 Nuclear structure
Z 2181 Cytoskeleton
However, this service only accepts protein inputs, which unlike Trinotate does not include the annotations derived from the BLASTX queries. I would much prefer to include all of the eggNOG annotations provided by Trinotate.
Best regards,
James