Functional categorization of eggNOG Trinotate output

266 views
Skip to first unread message

jvi...@cub.uca.edu

unread,
Sep 6, 2017, 12:26:30 PM9/6/17
to trinityrnaseq-users
Hi all,

Is there a way to take the eggNOG output from Trinotate and get the corresponding functional categories (http://eggnog.embl.de/version_4.0.beta/data/downloads/eggnogv4.funccats.txt)?

INFORMATION STORAGE AND PROCESSING
 [J] Translation, ribosomal structure and biogenesis
 [A] RNA processing and modification
 [K] Transcription
 [L] Replication, recombination and repair
 [B] Chromatin structure and dynamics

CELLULAR PROCESSES AND SIGNALING
 [D] Cell cycle control, cell division, chromosome partitioning
 [Y] Nuclear structure
 [V] Defense mechanisms
 [T] Signal transduction mechanisms
 [M] Cell wall/membrane/envelope biogenesis
 [N] Cell motility
 [Z] Cytoskeleton
 [W] Extracellular structures
 [U] Intracellular trafficking, secretion, and vesicular transport
 [O] Posttranslational modification, protein turnover, chaperones

METABOLISM
 [C] Energy production and conversion
 [G] Carbohydrate transport and metabolism
 [E] Amino acid transport and metabolism
 [F] Nucleotide transport and metabolism
 [H] Coenzyme transport and metabolism
 [I] Lipid transport and metabolism
 [P] Inorganic ion transport and metabolism
 [Q] Secondary metabolites biosynthesis, transport and catabolism

POORLY CHARACTERIZED
 [R] General function prediction only
 [S] Function unknown

I have the eggNOG and COG IDs from Trinotate in a list like this:

ENOG4112CG2
ENOG4112CEJ
ENOG4112CDX
ENOG4112CDU
COG0258
COG0256
COG0249



I was hoping there might be a tool or script someone knows of that can take a list of eggNOG and COG identifiers and map them to their corresponding functional category (A-Z). From there I could use simple awk commands to get counts and then use R to graph them like this:


http://i.stack.imgur.com/36MKI.png
(http://i.stack.imgur.com/36MKI.png)

Alternatively, I have experimented with a KOG annotation server (http://weizhong-lab.ucsd.edu/metagenomic-analysis/server/kog/), which gives output like such:

#KOG class    count    description
A    2924    RNA processing and modification
B    869    Chromatin structure and dynamics
C    1985    Energy production and conversion
D    1488    Cell cycle control, cell division, chromosome partitioning
E    2444    Amino acid transport and metabolism
F    602    Nucleotide transport and metabolism
G    3127    Carbohydrate transport and metabolism
H    696    Coenzyme transport and metabolism
I    2873    Lipid transport and metabolism
J    2494    Translation, ribosomal structure and biogenesis
K    3834    Transcription
L    1771    Replication, recombination and repair
M    870    Cell wall/membrane/envelope biogenesis
N    13    Cell motility
O    5940    Posttranslational modification, protein turnover, chaperones
P    1518    Inorganic ion transport and metabolism
Q    2473    Secondary metabolites biosynthesis, transport and catabolism
R    7325    General function prediction only
S    3590    Function unknown
T    9401    Signal transduction mechanisms
U    3169    Intracellular trafficking, secretion, and vesicular transport
V    546    Defense mechanisms
W    254    Extracellular structures
X    2    multiple functions
Y    211    Nuclear structure
Z    2181    Cytoskeleton

However, this service only accepts protein inputs, which unlike Trinotate does not include the  annotations derived from the BLASTX queries. I would much prefer to include all of the eggNOG annotations provided by Trinotate.

Best regards,

James 

Brian Haas

unread,
Sep 6, 2017, 3:17:58 PM9/6/17
to James Vire, trinityrnaseq-users
I can't promise anything soon, but I put it on my list to generate additional reports and figures when the next release is ready.

best,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

James Vire

unread,
Sep 13, 2017, 5:51:37 PM9/13/17
to trinityrnaseq-users
Thank you Brian! No rush.

Best regards,
James

Reply all
Reply to author
Forward
0 new messages