EC grouping errors?

57 views
Skip to first unread message

Xabier Vázquez Campos

unread,
Jan 18, 2016, 4:48:55 AM1/18/16
to HUMAnN Users
Hi,

after using the regroup script for EC numbers I found that one of the lines has weird numbering, i.e. "1.4.99.b|" which does not correspond to any EC number and I don't know where it may come from.

Also, I noticed that some entries contain several EC numbers, which might be fine, I guess, if you have a fused protein for example, but some of these entries are the same but with the EC numbers in different order, e.g.
1.3.1.10,1.3.1.39,2.3.1.85,2.3.1.86
1.3.1.10,2.3.1.86,1.3.1.39,2.3.1.85

Regards,

Xabier

Xabier Vázquez Campos

unread,
Jan 18, 2016, 5:01:07 AM1/18/16
to HUMAnN Users
Some other weird numberings:

1.14.19.f|
1.8.99.b|
6.2.1.i|

Xabier Vázquez Campos

unread,
Jan 18, 2016, 9:09:51 PM1/18/16
to HUMAnN Users
Hi again,

just to update on this, these EC numbers do not appear in MetaCyc and are not renamed with the humann2_rename script, however I have found correspondences at SymbaphidCyc. Just search the term without the "|"
http://bf2i200.insa-lyon.fr:5555/

Eric Franzosa

unread,
Jan 19, 2016, 10:44:50 AM1/19/16
to humann...@googlegroups.com
Hi Xabier,

Thanks for pointing this out. The original EC mapping was based on the information provided by MetaCyc, which includes (among other things) many to one mappings of ECs to RXNs.

Since ECs are proving to be a useful option for summarizing gene family data, we completely revamped the EC mapping procedure for the next HUMAnN2 release (v0.6, coming later this week). We now take EC mappings directly from UniProt (SwissProt + TrEMBL) which considerably expands the number of UniRef50s that map to a known EC. In the meantime, I've made the new mapping available here:


It can be used as a "custom" map with earlier versions of the regroup script (it will replace the automated EC regroup as of the next release). I just confirmed and there are NO mappings here to ECs with alphabet characters.

Thanks,
Eric


Xabier Vázquez Campos

unread,
Jan 19, 2016, 9:56:58 PM1/19/16
to HUMAnN Users
Hi Eric,

I just tested the new EC mapping it in my dataset and the result is way neater: there are no multiple EC labels or weird numbers. However, the renaming script still misses some EC numbers (I counted 354 out of 4140 in my tsv file). For what I saw, the map_ec_name.txt.gz does not contain some of these EC numbers, e.g. 1.1.1.245, and few others have inadequate names e.g. 1.1.1.170 appears as "(decarboxylating)" and it should be 3beta-hydroxysteroid-4alpha-carboxylate 3-dehydrogenase (decarboxylating)

I don't know if there are plans for fixing the naming but if it helps, I had compiled a table for the EC numbers and their names directly from BRENDA database and manually annotated parent categories, i.e. a total of 7255 EC numbers, although it misses some EC with letters in the name such 6.3.2.n2. I attach it in case you want to use some of its contents.

Thanks,
Xabier
BRENDA_db_EC_list.xlsx

Eric Franzosa

unread,
Jan 20, 2016, 11:26:34 AM1/20/16
to humann...@googlegroups.com
Ah yes, forgot that we had to update the names to work with the new mapping! Here's that file as well (in advance of the official update):


This mapping was based on information from


Is there a similar flat file from BRENDA? If so, I'd be curious to take a look at it. I believe I was able to find names for all but two of the ECs using the file above.

Thanks,
Eric


Xabier Vázquez Campos

unread,
Jan 20, 2016, 7:19:20 PM1/20/16
to HUMAnN Users
No, there isn't a BRENDA flat file, or at least I didn't see any when I was looking for a EC list. I could grab the EC names from BRENDA by looking into the page source code of the "All enzymes" page, as the table is formatted a bit weird when I tried to copy it straightforward.
http://www.brenda-enzymes.org/all_enzymes.php
Reply all
Reply to author
Forward
0 new messages