I am trying to find a workaround to a problem, perhaps some of you can help.
Reactome's BioPAX level 2 file contains some smallMolecule instances
with more than one XREF to ChEBI, being converted from Reactome's
EntitySets (alternative substrates/products).
I want to parse this file and extract the different reactions from the
concrete substrates/products, but to that end both collections of xrefs
(alternative substrates and alternative products) must be ordered to
make the proper correspondence. While the BioPAX file seems to have the
XREFs ordered, some of these Reactome EntitySets are properly assigned,
but others are not.
Any ideas are more than welcome.
Thanks in advance,
-- Dr. Rafael Alc ntara
<rafael.alcant...@ebi.ac.uk>
Software Engineer
Cheminformatics and Metabolism Team
European Bioinformatics Institute - EMBL
Tel. +44 (0)1223 494414
Properties in BioPAX (and in RDF/OWL in general) do not have an
order. If Reactome output behaves as you describe, they are doing
something non-standard. Perhaps some one from Reactome can comment on
this?
If for some reason you still need the order: OpenRDF Sesame allows
you to write your own RDF handler, which receives the statements
directly from the RDF parser, I'm assuming in the order they appear in
the input.
I prefer Sesame to Jena, because Sesame is much more transparently
divided into components, allowing you to use some components of it
while skipping others, and to reimplement some of their interfaces. If
all you need is reading and writing RDF, you only need a small
fraction of Sesame (the parts named RIO, Model and Util, I think)
> I am trying to find a workaround to a problem, perhaps some of you can help.
> Reactome's BioPAX level 2 file contains some smallMolecule instances
> with more than one XREF to ChEBI, being converted from Reactome's
> EntitySets (alternative substrates/products).
> I want to parse this file and extract the different reactions from the
> concrete substrates/products, but to that end both collections of xrefs
> (alternative substrates and alternative products) must be ordered to
> make the proper correspondence. While the BioPAX file seems to have the
> XREFs ordered, some of these Reactome EntitySets are properly assigned,
> but others are not.
> Any ideas are more than welcome.
> Thanks in advance,
> --
> Dr. Rafael Alcántara
> <rafael.alcant...@ebi.ac.uk>
> Software Engineer
> Cheminformatics and Metabolism Team
> European Bioinformatics Institute - EMBL
> Tel. +44 (0)1223 494414
When we annotate a reaction in which an enzyme with broad substrate specificity can convert any one of a group of input / substrate molecules to corresponding output / product molecules, we use sets as input and output. The order in which the specific molecules are listed in the input set corresponds to the order in the output set, which is intended to mean that the reaction can convert input-1 to output-1 or input-2 to output-2 and so forth. There is nothing in the logic of our data model or curator software that enforces correct ordering of the two sets - it depends on the alertness of the human curator.
We do not catalog the attributes of small molecules, but get them by reference from ChEBI. With the growth of their data set and ontology structure, we now often have a better way (we think) to annotate broad substrate specificity. Instead of using home-made entity sets, we use higher-level terms from ChEBI. Thus, instead of assembling our own sets to list nucleotide monophosphates or aliphatic alcohols, we use the terms for those concepts from ChEBI to create a generic molecule instance in Reactome, then use that generic molecule as input or output in a reaction. A limitation of this approach is that not all molecule collections we need correspond to entities in ChEBI, so our annotation still sometimes uses sets (also, we haven't systematically replaced the existing legacy sets.)
That's where we stand at present - I hope it at least explains the situation.
-----Original Message-----
From: biopax-discuss@googlegroups.com [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver Ruebenacker
Sent: Wednesday, June 06, 2012 8:14 AM
To: biopax-discuss@googlegroups.com
Subject: Re: order of XREFs in smallMolecule
Hello Rafael,
Properties in BioPAX (and in RDF/OWL in general) do not have an order. If Reactome output behaves as you describe, they are doing something non-standard. Perhaps some one from Reactome can comment on this?
If for some reason you still need the order: OpenRDF Sesame allows you to write your own RDF handler, which receives the statements directly from the RDF parser, I'm assuming in the order they appear in the input.
I prefer Sesame to Jena, because Sesame is much more transparently divided into components, allowing you to use some components of it while skipping others, and to reimplement some of their interfaces. If all you need is reading and writing RDF, you only need a small fraction of Sesame (the parts named RIO, Model and Util, I think)
Take care
Oliver
On Wed, Jun 6, 2012 at 6:32 AM, Rafael Alcántara <rafael.alcant...@ebi.ac.uk> wrote:
> Hi.
> I am trying to find a workaround to a problem, perhaps some of you can help.
> Reactome's BioPAX level 2 file contains some smallMolecule instances > with more than one XREF to ChEBI, being converted from Reactome's > EntitySets (alternative substrates/products).
> I want to parse this file and extract the different reactions from the > concrete substrates/products, but to that end both collections of > xrefs (alternative substrates and alternative products) must be > ordered to make the proper correspondence. While the BioPAX file seems > to have the XREFs ordered, some of these Reactome EntitySets are > properly assigned, but others are not.
> I see the implementation in ReferenceHelper
> (http://biopax.sourceforge.net/paxtools-4.1.1/paxtools-core/xref/org/b > iopax/paxtools/impl/level2/ReferenceHelper.html#25)
> is a HashSet. That might be the reason, but I don't know how Jena is > handling the model under the hood (I use > JenaIOHandler().convertFromOWL(InputStream)).
> Any ideas are more than welcome.
> Thanks in advance,
> --
> Dr. Rafael Alcántara
> <rafael.alcant...@ebi.ac.uk>
> Software Engineer
> Cheminformatics and Metabolism Team
> European Bioinformatics Institute - EMBL Tel. +44 (0)1223 494414
Actually, I just remember that Sesame by default preserves order, so
the simplest solution would be to read the RDF graph relying on the
default implementation.
Technically, Sesame's RDFGraph is an interface extending a Java
Collection of Sesame Statements. The default implementation is a Java
List of Sesame Statements. That is in my mind not the most truthful
implementation of RDF/OWL standards, but it does have the advantage
that order is preserved.
If you choose to reimplement RDFGraph, you can use a Java Collection
other than a Java List, for example a Java Set, which does not
preserve order. Then you can still catch the order of statements in
the input by using a custom RDf handler.
On Wed, Jun 6, 2012 at 8:13 AM, Oliver Ruebenacker <cur...@gmail.com> wrote:
> Hello Rafael,
> Properties in BioPAX (and in RDF/OWL in general) do not have an
> order. If Reactome output behaves as you describe, they are doing
> something non-standard. Perhaps some one from Reactome can comment on
> this?
> If for some reason you still need the order: OpenRDF Sesame allows
> you to write your own RDF handler, which receives the statements
> directly from the RDF parser, I'm assuming in the order they appear in
> the input.
> I prefer Sesame to Jena, because Sesame is much more transparently
> divided into components, allowing you to use some components of it
> while skipping others, and to reimplement some of their interfaces. If
> all you need is reading and writing RDF, you only need a small
> fraction of Sesame (the parts named RIO, Model and Util, I think)
> Take care
> Oliver
> On Wed, Jun 6, 2012 at 6:32 AM, Rafael Alcántara
> <rafael.alcant...@ebi.ac.uk> wrote:
>> Hi.
>> I am trying to find a workaround to a problem, perhaps some of you can help.
>> Reactome's BioPAX level 2 file contains some smallMolecule instances
>> with more than one XREF to ChEBI, being converted from Reactome's
>> EntitySets (alternative substrates/products).
>> I want to parse this file and extract the different reactions from the
>> concrete substrates/products, but to that end both collections of xrefs
>> (alternative substrates and alternative products) must be ordered to
>> make the proper correspondence. While the BioPAX file seems to have the
>> XREFs ordered, some of these Reactome EntitySets are properly assigned,
>> but others are not.
>> I see the implementation in ReferenceHelper
>> (http://biopax.sourceforge.net/paxtools-4.1.1/paxtools-core/xref/org/b...)
>> is a HashSet. That might be the reason, but I don't know how Jena is
>> handling the model under the hood (I use
>> JenaIOHandler().convertFromOWL(InputStream)).
>> Any ideas are more than welcome.
>> Thanks in advance,
>> --
>> Dr. Rafael Alcántara
>> <rafael.alcant...@ebi.ac.uk>
>> Software Engineer
>> Cheminformatics and Metabolism Team
>> European Bioinformatics Institute - EMBL
>> Tel. +44 (0)1223 494414
Let me see if I understand your data correctly: you have a reaction
R: A -> B, and A is a superset of A1, A2, etc. while B is a superset
of B1, B2, etc. And R is really a superset of R1: A1 -> B1, R2: A2 ->
B2, etc.
You have a unification cross reference to ChEBI that describes A1.
Unless it is also valid for A, you can not legally list it as a
unification cross-reference of A.
For example, it would be illegal to assign to the same physical
entity unification cross references to both methanol and ethanol,
because it can not be both at the same time. I suppose you could use
cross reference instead of unification cross reference, but it would
be unclear what that means.
Can't you just spell out the individual reactions R1, R2, etc?
Take care
Oliver
On Wed, Jun 6, 2012 at 10:30 AM, D'Eustachio, Peter
<Peter.D'Eustac...@nyumc.org> wrote:
> When we annotate a reaction in which an enzyme with broad substrate specificity can convert any one of a group of input / substrate molecules to corresponding output / product molecules, we use sets as input and output. The order in which the specific molecules are listed in the input set corresponds to the order in the output set, which is intended to mean that the reaction can convert input-1 to output-1 or input-2 to output-2 and so forth. There is nothing in the logic of our data model or curator software that enforces correct ordering of the two sets - it depends on the alertness of the human curator.
> We do not catalog the attributes of small molecules, but get them by reference from ChEBI. With the growth of their data set and ontology structure, we now often have a better way (we think) to annotate broad substrate specificity. Instead of using home-made entity sets, we use higher-level terms from ChEBI. Thus, instead of assembling our own sets to list nucleotide monophosphates or aliphatic alcohols, we use the terms for those concepts from ChEBI to create a generic molecule instance in Reactome, then use that generic molecule as input or output in a reaction. A limitation of this approach is that not all molecule collections we need correspond to entities in ChEBI, so our annotation still sometimes uses sets (also, we haven't systematically replaced the existing legacy sets.)
> That's where we stand at present - I hope it at least explains the situation.
> Peter D'E
> -----Original Message-----
> From: biopax-discuss@googlegroups.com [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver Ruebenacker
> Sent: Wednesday, June 06, 2012 8:14 AM
> To: biopax-discuss@googlegroups.com
> Subject: Re: order of XREFs in smallMolecule
> Hello Rafael,
> Properties in BioPAX (and in RDF/OWL in general) do not have an order. If Reactome output behaves as you describe, they are doing something non-standard. Perhaps some one from Reactome can comment on this?
> If for some reason you still need the order: OpenRDF Sesame allows you to write your own RDF handler, which receives the statements directly from the RDF parser, I'm assuming in the order they appear in the input.
> I prefer Sesame to Jena, because Sesame is much more transparently divided into components, allowing you to use some components of it while skipping others, and to reimplement some of their interfaces. If all you need is reading and writing RDF, you only need a small fraction of Sesame (the parts named RIO, Model and Util, I think)
> Take care
> Oliver
> On Wed, Jun 6, 2012 at 6:32 AM, Rafael Alcántara <rafael.alcant...@ebi.ac.uk> wrote:
>> Hi.
>> I am trying to find a workaround to a problem, perhaps some of you can help.
>> Reactome's BioPAX level 2 file contains some smallMolecule instances
>> with more than one XREF to ChEBI, being converted from Reactome's
>> EntitySets (alternative substrates/products).
>> I want to parse this file and extract the different reactions from the
>> concrete substrates/products, but to that end both collections of
>> xrefs (alternative substrates and alternative products) must be
>> ordered to make the proper correspondence. While the BioPAX file seems
>> to have the XREFs ordered, some of these Reactome EntitySets are
>> properly assigned, but others are not.
>> I see the implementation in ReferenceHelper
>> (http://biopax.sourceforge.net/paxtools-4.1.1/paxtools-core/xref/org/b >> iopax/paxtools/impl/level2/ReferenceHelper.html#25)
>> is a HashSet. That might be the reason, but I don't know how Jena is
>> handling the model under the hood (I use
>> JenaIOHandler().convertFromOWL(InputStream)).
>> Any ideas are more than welcome.
>> Thanks in advance,
>> --
>> Dr. Rafael Alcántara
>> <rafael.alcant...@ebi.ac.uk>
>> Software Engineer
>> Cheminformatics and Metabolism Team
>> European Bioinformatics Institute - EMBL Tel. +44 (0)1223 494414
You understand right. I don't understand "unification cross-reference" reliably enough to be sure but what you say sounds plausible. We avoid a combinatorial explosion (which we would need to handle manually) by using sets in this way. I guess (but Guanming would know better) that a Reactome reaction involving a set could be expanded into a list of reactions with a single input and output chemical entity for each at the time of the export to BioPAX (though I know we've considered that and there are reasons - Guanming and Gary may know better - why we haven't done it).
-----Original Message-----
From: biopax-discuss@googlegroups.com [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver Ruebenacker
Sent: Wednesday, June 06, 2012 11:43 AM
To: biopax-discuss@googlegroups.com
Subject: Re: order of XREFs in smallMolecule
Hello Peter,
Let me see if I understand your data correctly: you have a reaction
R: A -> B, and A is a superset of A1, A2, etc. while B is a superset of B1, B2, etc. And R is really a superset of R1: A1 -> B1, R2: A2 -> B2, etc.
You have a unification cross reference to ChEBI that describes A1.
Unless it is also valid for A, you can not legally list it as a unification cross-reference of A.
For example, it would be illegal to assign to the same physical entity unification cross references to both methanol and ethanol, because it can not be both at the same time. I suppose you could use cross reference instead of unification cross reference, but it would be unclear what that means.
Can't you just spell out the individual reactions R1, R2, etc?
Take care
Oliver
On Wed, Jun 6, 2012 at 10:30 AM, D'Eustachio, Peter <Peter.D'Eustac...@nyumc.org> wrote:
> When we annotate a reaction in which an enzyme with broad substrate specificity can convert any one of a group of input / substrate molecules to corresponding output / product molecules, we use sets as input and output. The order in which the specific molecules are listed in the input set corresponds to the order in the output set, which is intended to mean that the reaction can convert input-1 to output-1 or input-2 to output-2 and so forth. There is nothing in the logic of our data model or curator software that enforces correct ordering of the two sets - it depends on the alertness of the human curator.
> We do not catalog the attributes of small molecules, but get them by > reference from ChEBI. With the growth of their data set and ontology > structure, we now often have a better way (we think) to annotate broad > substrate specificity. Instead of using home-made entity sets, we use > higher-level terms from ChEBI. Thus, instead of assembling our own > sets to list nucleotide monophosphates or aliphatic alcohols, we use > the terms for those concepts from ChEBI to create a generic molecule > instance in Reactome, then use that generic molecule as input or > output in a reaction. A limitation of this approach is that not all > molecule collections we need correspond to entities in ChEBI, so our > annotation still sometimes uses sets (also, we haven't systematically > replaced the existing legacy sets.)
> That's where we stand at present - I hope it at least explains the situation.
> Peter D'E
> -----Original Message-----
> From: biopax-discuss@googlegroups.com > [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver > Ruebenacker
> Sent: Wednesday, June 06, 2012 8:14 AM
> To: biopax-discuss@googlegroups.com
> Subject: Re: order of XREFs in smallMolecule
> Hello Rafael,
> Properties in BioPAX (and in RDF/OWL in general) do not have an order. If Reactome output behaves as you describe, they are doing something non-standard. Perhaps some one from Reactome can comment on this?
> If for some reason you still need the order: OpenRDF Sesame allows you to write your own RDF handler, which receives the statements directly from the RDF parser, I'm assuming in the order they appear in the input.
> I prefer Sesame to Jena, because Sesame is much more transparently > divided into components, allowing you to use some components of it > while skipping others, and to reimplement some of their interfaces. If > all you need is reading and writing RDF, you only need a small > fraction of Sesame (the parts named RIO, Model and Util, I think)
> Take care
> Oliver
> On Wed, Jun 6, 2012 at 6:32 AM, Rafael Alcántara <rafael.alcant...@ebi.ac.uk> wrote:
>> Hi.
>> I am trying to find a workaround to a problem, perhaps some of you can help.
>> Reactome's BioPAX level 2 file contains some smallMolecule instances >> with more than one XREF to ChEBI, being converted from Reactome's >> EntitySets (alternative substrates/products).
>> I want to parse this file and extract the different reactions from >> the concrete substrates/products, but to that end both collections of >> xrefs (alternative substrates and alternative products) must be >> ordered to make the proper correspondence. While the BioPAX file >> seems to have the XREFs ordered, some of these Reactome EntitySets >> are properly assigned, but others are not.
>> I see the implementation in ReferenceHelper >> (http://biopax.sourceforge.net/paxtools-4.1.1/paxtools-core/xref/org/ >> b
>> iopax/paxtools/impl/level2/ReferenceHelper.html#25)
>> is a HashSet. That might be the reason, but I don't know how Jena is >> handling the model under the hood (I use >> JenaIOHandler().convertFromOWL(InputStream)).
>> Any ideas are more than welcome.
>> Thanks in advance,
>> --
>> Dr. Rafael Alcántara
>> <rafael.alcant...@ebi.ac.uk>
>> Software Engineer
>> Cheminformatics and Metabolism Team
>> European Bioinformatics Institute - EMBL Tel. +44 (0)1223 494414
Just as Peter said, we have not tried to expand reactions involving sets to avoid combinatorial explosion and also hoped to make the exported data closer to our original Reactome contents.
Thanks,
Guanming
On Jun 6, 2012, at 9:50 AM, D'Eustachio, Peter wrote:
> You understand right. I don't understand "unification cross-reference" reliably enough to be sure but what you say sounds plausible. We avoid a combinatorial explosion (which we would need to handle manually) by using sets in this way. I guess (but Guanming would know better) that a Reactome reaction involving a set could be expanded into a list of reactions with a single input and output chemical entity for each at the time of the export to BioPAX (though I know we've considered that and there are reasons - Guanming and Gary may know better - why we haven't done it).
> Peter
> -----Original Message-----
> From: biopax-discuss@googlegroups.com [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver Ruebenacker
> Sent: Wednesday, June 06, 2012 11:43 AM
> To: biopax-discuss@googlegroups.com
> Subject: Re: order of XREFs in smallMolecule
> Hello Peter,
> Let me see if I understand your data correctly: you have a reaction
> R: A -> B, and A is a superset of A1, A2, etc. while B is a superset of B1, B2, etc. And R is really a superset of R1: A1 -> B1, R2: A2 -> B2, etc.
> You have a unification cross reference to ChEBI that describes A1.
> Unless it is also valid for A, you can not legally list it as a unification cross-reference of A.
> For example, it would be illegal to assign to the same physical entity unification cross references to both methanol and ethanol, because it can not be both at the same time. I suppose you could use cross reference instead of unification cross reference, but it would be unclear what that means.
> Can't you just spell out the individual reactions R1, R2, etc?
> Take care
> Oliver
> On Wed, Jun 6, 2012 at 10:30 AM, D'Eustachio, Peter <Peter.D'Eustac...@nyumc.org> wrote:
>> When we annotate a reaction in which an enzyme with broad substrate specificity can convert any one of a group of input / substrate molecules to corresponding output / product molecules, we use sets as input and output. The order in which the specific molecules are listed in the input set corresponds to the order in the output set, which is intended to mean that the reaction can convert input-1 to output-1 or input-2 to output-2 and so forth. There is nothing in the logic of our data model or curator software that enforces correct ordering of the two sets - it depends on the alertness of the human curator.
>> We do not catalog the attributes of small molecules, but get them by >> reference from ChEBI. With the growth of their data set and ontology >> structure, we now often have a better way (we think) to annotate broad >> substrate specificity. Instead of using home-made entity sets, we use >> higher-level terms from ChEBI. Thus, instead of assembling our own >> sets to list nucleotide monophosphates or aliphatic alcohols, we use >> the terms for those concepts from ChEBI to create a generic molecule >> instance in Reactome, then use that generic molecule as input or >> output in a reaction. A limitation of this approach is that not all >> molecule collections we need correspond to entities in ChEBI, so our >> annotation still sometimes uses sets (also, we haven't systematically >> replaced the existing legacy sets.)
>> That's where we stand at present - I hope it at least explains the situation.
>> Peter D'E
>> -----Original Message-----
>> From: biopax-discuss@googlegroups.com >> [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver >> Ruebenacker
>> Sent: Wednesday, June 06, 2012 8:14 AM
>> To: biopax-discuss@googlegroups.com
>> Subject: Re: order of XREFs in smallMolecule
>> Hello Rafael,
>> Properties in BioPAX (and in RDF/OWL in general) do not have an order. If Reactome output behaves as you describe, they are doing something non-standard. Perhaps some one from Reactome can comment on this?
>> If for some reason you still need the order: OpenRDF Sesame allows you to write your own RDF handler, which receives the statements directly from the RDF parser, I'm assuming in the order they appear in the input.
>> I prefer Sesame to Jena, because Sesame is much more transparently >> divided into components, allowing you to use some components of it >> while skipping others, and to reimplement some of their interfaces. If >> all you need is reading and writing RDF, you only need a small >> fraction of Sesame (the parts named RIO, Model and Util, I think)
>> Take care
>> Oliver
>> On Wed, Jun 6, 2012 at 6:32 AM, Rafael Alcántara <rafael.alcant...@ebi.ac.uk> wrote:
>>> Hi.
>>> I am trying to find a workaround to a problem, perhaps some of you can help.
>>> Reactome's BioPAX level 2 file contains some smallMolecule instances >>> with more than one XREF to ChEBI, being converted from Reactome's >>> EntitySets (alternative substrates/products).
>>> I want to parse this file and extract the different reactions from >>> the concrete substrates/products, but to that end both collections of >>> xrefs (alternative substrates and alternative products) must be >>> ordered to make the proper correspondence. While the BioPAX file >>> seems to have the XREFs ordered, some of these Reactome EntitySets >>> are properly assigned, but others are not.
>>> I see the implementation in ReferenceHelper >>> (http://biopax.sourceforge.net/paxtools-4.1.1/paxtools-core/xref/org/ >>> b
>>> iopax/paxtools/impl/level2/ReferenceHelper.html#25)
>>> is a HashSet. That might be the reason, but I don't know how Jena is >>> handling the model under the hood (I use >>> JenaIOHandler().convertFromOWL(InputStream)).
>>> Any ideas are more than welcome.
>>> Thanks in advance,
>>> --
>>> Dr. Rafael Alcántara
>>> <rafael.alcant...@ebi.ac.uk>
>>> Software Engineer
>>> Cheminformatics and Metabolism Team
>>> European Bioinformatics Institute - EMBL Tel. +44 (0)1223 494414
I don't see combinatorial explosion in the way I (and Rafael, I
think) understood it, i.e. if you only have reactions R1: A1 -> B1,
R2: A2 -> B2, etc.
To have combinatorial explosion, you would also need to have more
reactions, e.g. R12: A1 -> B2. In that case, I don't see why order
would matter (as Rafael assumed).
My understanding of unification cross reference is that two entities
can be considered identical if they have one identical unification
cross reference. E.g. if X and Y both have UXR "ethanol", they are the
same, i.e. they are both ethanol. That wouldn't work if B at the same
time also had UXR "methanol".
I would prefer you to spell out all reactions for the sake of
correct BioPAX and the possibility of integrating your date with
other. And how bad can it be? If you have reaction A + B -> C, and
both A and B have twenty options each, let there be 400 reactions, no
big deal.
You can also make expansion optional, to keep it simple for those
who evaluate pathways by eye.
On Wed, Jun 6, 2012 at 1:03 PM, Guanming Wu <guanmin...@gmail.com> wrote:
> Hi Oliver,
> Just as Peter said, we have not tried to expand reactions involving sets to avoid combinatorial explosion and also hoped to make the exported data closer to our original Reactome contents.
> Thanks,
> Guanming
> On Jun 6, 2012, at 9:50 AM, D'Eustachio, Peter wrote:
>> Hello Oliver,
>> You understand right. I don't understand "unification cross-reference" reliably enough to be sure but what you say sounds plausible. We avoid a combinatorial explosion (which we would need to handle manually) by using sets in this way. I guess (but Guanming would know better) that a Reactome reaction involving a set could be expanded into a list of reactions with a single input and output chemical entity for each at the time of the export to BioPAX (though I know we've considered that and there are reasons - Guanming and Gary may know better - why we haven't done it).
>> Peter
>> -----Original Message-----
>> From: biopax-discuss@googlegroups.com [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver Ruebenacker
>> Sent: Wednesday, June 06, 2012 11:43 AM
>> To: biopax-discuss@googlegroups.com
>> Subject: Re: order of XREFs in smallMolecule
>> Hello Peter,
>> Let me see if I understand your data correctly: you have a reaction
>> R: A -> B, and A is a superset of A1, A2, etc. while B is a superset of B1, B2, etc. And R is really a superset of R1: A1 -> B1, R2: A2 -> B2, etc.
>> You have a unification cross reference to ChEBI that describes A1.
>> Unless it is also valid for A, you can not legally list it as a unification cross-reference of A.
>> For example, it would be illegal to assign to the same physical entity unification cross references to both methanol and ethanol, because it can not be both at the same time. I suppose you could use cross reference instead of unification cross reference, but it would be unclear what that means.
>> Can't you just spell out the individual reactions R1, R2, etc?
>> Take care
>> Oliver
>> On Wed, Jun 6, 2012 at 10:30 AM, D'Eustachio, Peter <Peter.D'Eustac...@nyumc.org> wrote:
>>> When we annotate a reaction in which an enzyme with broad substrate specificity can convert any one of a group of input / substrate molecules to corresponding output / product molecules, we use sets as input and output. The order in which the specific molecules are listed in the input set corresponds to the order in the output set, which is intended to mean that the reaction can convert input-1 to output-1 or input-2 to output-2 and so forth. There is nothing in the logic of our data model or curator software that enforces correct ordering of the two sets - it depends on the alertness of the human curator.
>>> We do not catalog the attributes of small molecules, but get them by
>>> reference from ChEBI. With the growth of their data set and ontology
>>> structure, we now often have a better way (we think) to annotate broad
>>> substrate specificity. Instead of using home-made entity sets, we use
>>> higher-level terms from ChEBI. Thus, instead of assembling our own
>>> sets to list nucleotide monophosphates or aliphatic alcohols, we use
>>> the terms for those concepts from ChEBI to create a generic molecule
>>> instance in Reactome, then use that generic molecule as input or
>>> output in a reaction. A limitation of this approach is that not all
>>> molecule collections we need correspond to entities in ChEBI, so our
>>> annotation still sometimes uses sets (also, we haven't systematically
>>> replaced the existing legacy sets.)
>>> That's where we stand at present - I hope it at least explains the situation.
>>> Peter D'E
>>> -----Original Message-----
>>> From: biopax-discuss@googlegroups.com
>>> [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver
>>> Ruebenacker
>>> Sent: Wednesday, June 06, 2012 8:14 AM
>>> To: biopax-discuss@googlegroups.com
>>> Subject: Re: order of XREFs in smallMolecule
>>> Hello Rafael,
>>> Properties in BioPAX (and in RDF/OWL in general) do not have an order. If Reactome output behaves as you describe, they are doing something non-standard. Perhaps some one from Reactome can comment on this?
>>> If for some reason you still need the order: OpenRDF Sesame allows you to write your own RDF handler, which receives the statements directly from the RDF parser, I'm assuming in the order they appear in the input.
>>> I prefer Sesame to Jena, because Sesame is much more transparently
>>> divided into components, allowing you to use some components of it
>>> while skipping others, and to reimplement some of their interfaces. If
>>> all you need is reading and writing RDF, you only need a small
>>> fraction of Sesame (the parts named RIO, Model and Util, I think)
>>> Take care
>>> Oliver
>>> On Wed, Jun 6, 2012 at 6:32 AM, Rafael Alcántara <rafael.alcant...@ebi.ac.uk> wrote:
>>>> Hi.
>>>> I am trying to find a workaround to a problem, perhaps some of you can help.
>>>> Reactome's BioPAX level 2 file contains some smallMolecule instances
>>>> with more than one XREF to ChEBI, being converted from Reactome's
>>>> EntitySets (alternative substrates/products).
>>>> I want to parse this file and extract the different reactions from
>>>> the concrete substrates/products, but to that end both collections of
>>>> xrefs (alternative substrates and alternative products) must be
>>>> ordered to make the proper correspondence. While the BioPAX file
>>>> seems to have the XREFs ordered, some of these Reactome EntitySets
>>>> are properly assigned, but others are not.
>>>> I see the implementation in ReferenceHelper
>>>> (http://biopax.sourceforge.net/paxtools-4.1.1/paxtools-core/xref/org/ >>>> b
>>>> iopax/paxtools/impl/level2/ReferenceHelper.html#25)
>>>> is a HashSet. That might be the reason, but I don't know how Jena is
>>>> handling the model under the hood (I use
>>>> JenaIOHandler().convertFromOWL(InputStream)).
>>>> Any ideas are more than welcome.
>>>> Thanks in advance,
>>>> --
>>>> Dr. Rafael Alcántara
>>>> <rafael.alcant...@ebi.ac.uk>
>>>> Software Engineer
>>>> Cheminformatics and Metabolism Team
>>>> European Bioinformatics Institute - EMBL Tel. +44 (0)1223 494414
For our attempts at a human-readable graphic representation of pathways, 20 versus 400, or even 40, is a substantial difference. For a data set that will only be traversed computationally, 20 versus 400 or even 4000 is probably not important, so ways of taking our "compact" representation and expanding the instances that involve generic or set inputs could be workable - simply getting rid of the "compact" representation within Reactome, I think, isn't workable.
Another issue is maintenance. If we create a reaction with a set as input, two kinds are easy and reliable. Changing the list of allowed inputs requires only an edit of the set instance, not addition or deletion of a whole reaction, and changing some other feature of the reaction again involves only a single edit, not parallel edits on each of a series of reactions.
Another use of sets is to group different enzymes that can all catalyze the same reaction. Human Reactome does that only to a limited extent, but groups using the Reactome data model to annotate reactions for diverse microbes and possibly for plants will use this annotation strategy extensively (as does KEGG already).
-----Original Message-----
From: biopax-discuss@googlegroups.com [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver Ruebenacker
Sent: Thursday, June 07, 2012 11:59 AM
To: biopax-discuss@googlegroups.com
Subject: Re: order of XREFs in smallMolecule
Hello Peter, Guanming,
I don't see combinatorial explosion in the way I (and Rafael, I
think) understood it, i.e. if you only have reactions R1: A1 -> B1,
R2: A2 -> B2, etc.
To have combinatorial explosion, you would also need to have more reactions, e.g. R12: A1 -> B2. In that case, I don't see why order would matter (as Rafael assumed).
My understanding of unification cross reference is that two entities can be considered identical if they have one identical unification cross reference. E.g. if X and Y both have UXR "ethanol", they are the same, i.e. they are both ethanol. That wouldn't work if B at the same time also had UXR "methanol".
I would prefer you to spell out all reactions for the sake of correct BioPAX and the possibility of integrating your date with other. And how bad can it be? If you have reaction A + B -> C, and both A and B have twenty options each, let there be 400 reactions, no big deal.
You can also make expansion optional, to keep it simple for those who evaluate pathways by eye.
Take care
Oliver
On Wed, Jun 6, 2012 at 1:03 PM, Guanming Wu <guanmin...@gmail.com> wrote:
> Hi Oliver,
> Just as Peter said, we have not tried to expand reactions involving sets to avoid combinatorial explosion and also hoped to make the exported data closer to our original Reactome contents.
> Thanks,
> Guanming
> On Jun 6, 2012, at 9:50 AM, D'Eustachio, Peter wrote:
>> Hello Oliver,
>> You understand right. I don't understand "unification cross-reference" reliably enough to be sure but what you say sounds plausible. We avoid a combinatorial explosion (which we would need to handle manually) by using sets in this way. I guess (but Guanming would know better) that a Reactome reaction involving a set could be expanded into a list of reactions with a single input and output chemical entity for each at the time of the export to BioPAX (though I know we've considered that and there are reasons - Guanming and Gary may know better - why we haven't done it).
>> Peter
>> -----Original Message-----
>> From: biopax-discuss@googlegroups.com >> [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver >> Ruebenacker
>> Sent: Wednesday, June 06, 2012 11:43 AM
>> To: biopax-discuss@googlegroups.com
>> Subject: Re: order of XREFs in smallMolecule
>> Hello Peter,
>> Let me see if I understand your data correctly: you have a reaction
>> R: A -> B, and A is a superset of A1, A2, etc. while B is a superset of B1, B2, etc. And R is really a superset of R1: A1 -> B1, R2: A2 -> B2, etc.
>> You have a unification cross reference to ChEBI that describes A1.
>> Unless it is also valid for A, you can not legally list it as a unification cross-reference of A.
>> For example, it would be illegal to assign to the same physical entity unification cross references to both methanol and ethanol, because it can not be both at the same time. I suppose you could use cross reference instead of unification cross reference, but it would be unclear what that means.
>> Can't you just spell out the individual reactions R1, R2, etc?
>> Take care
>> Oliver
>> On Wed, Jun 6, 2012 at 10:30 AM, D'Eustachio, Peter <Peter.D'Eustac...@nyumc.org> wrote:
>>> When we annotate a reaction in which an enzyme with broad substrate specificity can convert any one of a group of input / substrate molecules to corresponding output / product molecules, we use sets as input and output. The order in which the specific molecules are listed in the input set corresponds to the order in the output set, which is intended to mean that the reaction can convert input-1 to output-1 or input-2 to output-2 and so forth. There is nothing in the logic of our data model or curator software that enforces correct ordering of the two sets - it depends on the alertness of the human curator.
>>> We do not catalog the attributes of small molecules, but get them by >>> reference from ChEBI. With the growth of their data set and ontology >>> structure, we now often have a better way (we think) to annotate >>> broad substrate specificity. Instead of using home-made entity sets, >>> we use higher-level terms from ChEBI. Thus, instead of assembling >>> our own sets to list nucleotide monophosphates or aliphatic >>> alcohols, we use the terms for those concepts from ChEBI to create a >>> generic molecule instance in Reactome, then use that generic >>> molecule as input or output in a reaction. A limitation of this >>> approach is that not all molecule collections we need correspond to >>> entities in ChEBI, so our annotation still sometimes uses sets >>> (also, we haven't systematically replaced the existing legacy sets.)
>>> That's where we stand at present - I hope it at least explains the situation.
>>> Peter D'E
>>> -----Original Message-----
>>> From: biopax-discuss@googlegroups.com >>> [mailto:biopax-discuss@googlegroups.com] On Behalf Of Oliver >>> Ruebenacker
>>> Sent: Wednesday, June 06, 2012 8:14 AM
>>> To: biopax-discuss@googlegroups.com
>>> Subject: Re: order of XREFs in smallMolecule
>>> Hello Rafael,
>>> Properties in BioPAX (and in RDF/OWL in general) do not have an order. If Reactome output behaves as you describe, they are doing something non-standard. Perhaps some one from Reactome can comment on this?
>>> If for some reason you still need the order: OpenRDF Sesame allows you to write your own RDF handler, which receives the statements directly from the RDF parser, I'm assuming in the order they appear in the input.
>>> I prefer Sesame to Jena, because Sesame is much more transparently >>> divided into components, allowing you to use some components of it >>> while skipping others, and to reimplement some of their interfaces. >>> If all you need is reading and writing RDF, you only need a small >>> fraction of Sesame (the parts named RIO, Model and Util, I think)
>>> Take care
>>> Oliver
>>> On Wed, Jun 6, 2012 at 6:32 AM, Rafael Alcántara <rafael.alcant...@ebi.ac.uk> wrote:
>>>> Hi.
>>>> I am trying to find a workaround to a problem, perhaps some of you can help.
>>>> Reactome's BioPAX level 2 file contains some smallMolecule >>>> instances with more than one XREF to ChEBI, being converted from >>>> Reactome's EntitySets (alternative substrates/products).
>>>> I want to parse this file and extract the different reactions from >>>> the concrete substrates/products, but to that end both collections >>>> of xrefs (alternative substrates and alternative products) must be >>>> ordered to make the proper correspondence. While the BioPAX file >>>> seems to have the XREFs ordered, some of these Reactome EntitySets >>>> are properly assigned, but others are not.
>>>> I see the implementation in ReferenceHelper >>>> (http://biopax.sourceforge.net/paxtools-4.1.1/paxtools-core/xref/or >>>> g/
>>>> b
>>>> iopax/paxtools/impl/level2/ReferenceHelper.html#25)
>>>> is a HashSet. That might be the reason, but I don't know how Jena >>>> is handling the model under the hood (I use >>>> JenaIOHandler().convertFromOWL(InputStream)).
>>>> Any ideas are more than welcome.
>>>> Thanks in advance,
>>>> --
>>>> Dr. Rafael Alcántara
>>>> <rafael.alcant...@ebi.ac.uk>
>>>> Software Engineer
>>>> Cheminformatics and Metabolism Team European Bioinformatics >>>> Institute - EMBL Tel. +44 (0)1223 494414
1. According to L3 *UnificationXref* semantic, if any two BioPAX objects share at least one *UnificationXref* that means they are about *the same Thing*, and can be merged. In most cases (from my experience) a *UnificationXref was added to more than one parent BioPAX object by mistake, such as - ** Relationship**Xref* class should be used for orthologs IDs, UniProt * UnificationXref* is NOT for a *Protein* (it's for a *ProteinReference*), ChEBI *UnificationXref* is not for *SmallMolecule* (it's for * SmallMoleculeReference*), EntrezGene does not "work" for a * ProteinReference's* *UnificationXref, etc.*
2. There are proc and cons and limitations of both having "20" (generalized knowledge) vs. "400" (more specific, experimental data) interactions, and people actually need both (curated). But, as Peter said, "compact" BioPAX export is more convenient both from visualization and curation/support perspectives. Whereas, for a more specific computational analysis, e.g., before mapping to SBML, SIF or GSEA, it should be possible (though not easy) to *expand* and *filter* these data, i.e., convert to another, perhaps one-organism, no-generics BioPAX representation and then remove all undesired evidence, interactions, and members. How to *expand/filter* won't be the same for all projects; it has to be decided by a researcher or s/w engineer on case by case basis. Also, if there is reaction A + B -> C, and both A and B have twenty options each, it won't necessarily lead to 400 reactions if done wisely (should not mess up different organisms, consider publications, evidence, etc...)