Difference between results of API and Entry on website

16 views
Skip to first unread message

Andrew Tarzia

unread,
Nov 15, 2018, 3:28:34 AM11/15/18
to SABIO-RK
Hello all,

Firstly, thank you for your effort in providing this database for academic use, especially with respect to including protein sequence information when available.

I apologise if this problem has not been explained elsewhere already. 

I have been accessing the database using the Web services and found a discrepancy between the reaction components on the website versus the results of the API:

For example:

Entry ID "50401" on the site:

Substrates
namelocationcomment
Oxalate-13C-labelled
Products
namelocationcomment
CO2--
Formate--
Modifiers
namelocationeffectcommentprotein complex
oxalate decarboxylase(Enzyme)-Modifier-Catalystpurified(O34714)*6;

Using the following python code (python 3.6) to collect the components using the API:
------
# encode next request, for parameter data given entry IDs
data_field = {'entryIDs[]': "50401"}
query = {'format': 'tsv','fields[]': ['EntryID', 'Organism', 'ECNumber', 'SabioReactionID', 'UniprotID', 'ReactionEquation', 'EnzymeType']}
# make POST request
request = requests.post(PARAM_QUERY_URL, params=query, data=data_field)
request.raise_for_status()
# results
_, organism, _, rxn_id, UniprotID, RE, enzymetype = request.text.split('\n')[1].split('\t')

params = {"SabioReactionID": rxn_id, "fields[]": ["Name", "Role", "SabioCompoundID", "ChebiID", "PubChemID", "KeggCompoundID", 'UniprotID']}
request = requests.post(QUERY_URL, params=params)
request.raise_for_status()
# collate request output
print(request.text)
------
Output:
> Name Role SabioCompoundID ChebiID PubChemID KeggCompoundID UniprotID
CO2 Product 1266 16526 3313 C00011 null
Mn2+ Modifier-Cofactor 1279 29035 C00034 null
Formate Product 1285 30751 15740 3358 C00058 null
Oxalate Substrate 1975 30623 16995 3509 C00209 null

I am assuming that because I am collecting the components using the reaction ID and not the entry ID these two results differ(?), but the important question is: are the two results meant to differ? If the entry has that reaction ID, then shouldn't Mn2+ be listed as a co-factor?

Would you recommend a different way to get all molecular components of a reaction from SABIO using the API?

This is not the only example I have found where this occurs.

Thank you in advance for your assitance,

Andrew Tarzia


Ulrike Wittig

unread,
Nov 15, 2018, 6:38:16 AM11/15/18
to SABIO-RK
Hi Andrew,
thanks for reporting this problem.
There is a bug in the method http://sabiork.h-its.org/testSabio/sabioRestWebServices/searchReactionParticipants.
It should give all reaction participants extracted from all database entries for the given reaction which means all cofactors, inhibitors etc. At the moment it randomly selects only the data from one single entry.

Sorry for the inconvenience. We will fix it asap.
regards,
Ulrike

Andrew Tarzia

unread,
Nov 15, 2018, 6:48:16 AM11/15/18
to SABIO-RK
Hi Ulrike,

Thank you for the quick response.

If that is the case, would the fixed code extract all reactants and products of a single reaction + all necessary cofactors for that reaction to occur + all possible (but not necessary) modifiers (activators, inhibitors, unknown)? The reason I ask, is because I aim to have only the necessary components of a single reactants -> products transformation (i.e. protein sequence (if available) and necessary co-factors) and it would not be ideal if each reaction came with multiple possible co-factors that may be used interchangebly.

Thank you again

Ulrike Wittig

unread,
Nov 15, 2018, 7:44:03 AM11/15/18
to SABIO-RK
Hi Andrew,

thank you for this comment. Based on that we will discuss in our group how to implement it to give more flexibility.

regards,
Ulrike 

Andrew Tarzia

unread,
Nov 22, 2018, 11:50:17 PM11/22/18
to SABIO-RK
Hey Ulrike,

Are the UniProt IDs output by 'http://sabiork.h-its.org/entry/exportToExcelCustomizable' also affected?

For example the code below gives the UniProtID 'P15232' for the entry 3740, but online it has: P15232P15233;

Is this behaviour correct and those UniProt IDs are interchangable? or is this another bug?

Thank you,

Andrew

Code:
------
import requests
# encode next request, for parameter data given entry IDs
data_field = {'entryIDs[]': '3740'}
query = {'format': 'tsv',  'fields[]': ['EntryID',
                          'Organism',
                          'ECNumber',
                          'SabioReactionID',
                          'UniprotID',
                          'ReactionEquation',
                          'EnzymeType']}
request = requests.post(PARAM_QUERY_URL, params=query, data=data_field)
request.raise_for_status()
print(request.text)
------
Output:
'EntryID\tOrganism\tECNumber\tSabioReactionID\tUniprotID\tReactionEquation\tEnzymeType\n3740\tArmoracia rusticana\t1.11.1.7\t7135\tP15232\to-Toluidine + H2O2 = H2O + Oxidized donor\twildtype\n'
------

Ulrike Wittig

unread,
Nov 26, 2018, 4:49:18 AM11/26/18
to SABIO-RK
Hi Andrew,

there are two different ways of representing protein information in SABIO-RK. Entries with protein complexes containing different subunits (e.g. entryID 11899 ((P20707)(P20708)(P18925))) give all UniprotIDs of the complex. But there are also few entries available where the publication gives no detailed information about the specific isoenzyme. These protein information with possible different isoenzyme information are separated with semicolon and are currently not all represented correctly in the web services.
So this is a bug and will be fixed.

Thanks for the reporting!
Best regards,
Ulrike 

Andrew Tarzia

unread,
Nov 26, 2018, 5:00:52 AM11/26/18
to ulrike...@h-its.org, sabi...@googlegroups.com
Hey Ulrike,

Thank you for your responses.

When will I find out if the bugs have been fixed?

Is there a way of knowing if an entry could potentially have a bug or not based on the different protein information?

Thank you,

Andrew



--
You received this message because you are subscribed to the Google Groups "SABIO-RK" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sabio-rk+u...@googlegroups.com.
To post to this group, send email to sabi...@googlegroups.com.
Visit this group at https://groups.google.com/group/sabio-rk.
To view this discussion on the web visit https://groups.google.com/d/msgid/sabio-rk/ec4b0a19-7b93-4c8f-9ece-b939a9a7aac7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Tarzia

unread,
Nov 26, 2018, 5:08:16 AM11/26/18
to ulrike...@h-its.org, sabi...@googlegroups.com
Also - does that mean that for Entry 3740, where the web service outputs the UniProt ID P15232 - that the associated sequence is one correct isoenzyme that does the reaction of one of the possible sequences that could do the reaction? Ultimately I would like to avoid the ambiguity if it is the second case.

Thank you

Ulrike Wittig

unread,
Nov 26, 2018, 6:23:33 AM11/26/18
to SABIO-RK

Hi Andrew,

 

yes, you are right. We should avoid ambiguity. We have to find a way to distinguish the entries containing multiple isoenzymes from entries containing protein complexes.

Since we have some other priorities this week we will start to work on these web service problems next week. I will let you know as soon as we have them fixed and implemented.

 

Best regards,

Ulrike

To unsubscribe from this group and stop receiving emails from it, send an email to sabio-rk+unsubscribe@googlegroups.com.

Andrew Tarzia

unread,
Nov 26, 2018, 6:31:13 AM11/26/18
to SABIO-RK
Hey Ulrike,

Thank you very much for keeping me updated.

Andrew
To unsubscribe from this group and stop receiving emails from it, send an email to sabio-rk+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages