CHEMINF / CHEBI entry to use for chemical entity classes?

15 views
Skip to first unread message

Egon Willighagen

unread,
Nov 8, 2012, 6:30:07 AM11/8/12
to cheminf-...@googlegroups.com

Janna, others,

what CHEMINF or CHEBI (or other) ontological term should I use for a structure with unknown double bond positions or of unknown tautomeric structure, or even full unknown structure?

In particular, I want something for a pure chemical entity, it must not reflect a mixture of things, but it is a particular chemical structure, but we just do not know which exact one.

This uncertainty in the exact structure may have separate forms.

Use case 1
-----------------
accurate mass -> you get a specific structure with an molecular formula (and mass), but the exact structure you do not know

Use case 2
-----------------
if we have a specific compound but we are not sure which tautomeric form is currently has

What should I be using? If there is nothing for that right now, can we add this?

I think 'molecular entity' comes close, but requires a 'identifiable as a separately distinguishable entity' which is not the case here... it's one of a set of separately distinguishable entities what I am after... at the same time, the super class 'chemical entity' does not cover it either, as that includes mixtures, which I explicitly want to exclude...

So, basically I need something like a 'one of the possible molecular entities subclassing this class'...

Egon


Janna Hastings

unread,
Nov 8, 2012, 6:34:45 AM11/8/12
to cheminf-...@googlegroups.com
Hi Egon,

'molecular entity' is the one you want.  The 'separately distinguishable entity' clause just means that in theory it would be possible to identify your chemical, not that you already know which one it is. If you knew some information about the structure etc., you could assign one of the subclasses, but 'molecular entity' works even in the absence of any structural information -- it is a chemical entity with a molecular structure, you just don't know what that structure is.

Cheers, Janna

Egon Willighagen

unread,
Nov 8, 2012, 6:39:43 AM11/8/12
to cheminf-...@googlegroups.com
On Thu, Nov 8, 2012 at 12:34 PM, Janna Hastings
<janna.h...@gmail.com> wrote:
> 'molecular entity' is the one you want.

So, the below is fine?

:AorB rdfs:subClassOf CHEBI:23367 .
:A rdfs:subClasOf :AorB, CHEBI:23367 .
:B rdfs:subClasOf :AorB, CHEBI:23367 .

Egon

--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

Janna Hastings

unread,
Nov 8, 2012, 6:42:44 AM11/8/12
to cheminf-...@googlegroups.com
Yes, or without redundancies?

:AorB rdfs:subClassOf CHEBI:23367 .
:A rdfs:subClasOf :AorB .
:B rdfs:subClasOf :AorB.

(:A rdfs:subClassOf CHEBI:23367 can be inferred ...)

Cheers, Janna

Egon Willighagen

unread,
Nov 8, 2012, 6:46:48 AM11/8/12
to cheminf-ontology
On Thu, Nov 8, 2012 at 12:42 PM, Janna Hastings
<janna.h...@gmail.com> wrote:
> :AorB rdfs:subClassOf CHEBI:23367 .
> :A rdfs:subClasOf :AorB .
> :B rdfs:subClasOf :AorB.

So, how do I know then that :AorB is a 'one of'...? What if :A and :B
are not yet defined? Think use case 1... that is, I am struggling with
open/closed world here... if :A and :B are not defined, and we just
remain with:

:AorB rdfs:subClassOf CHEBI:23367 .

Then I cannot use the further subclassing to reason that :AorB is a
'one of...'... And, the lack of those subclasses, does that imply that
:AorB is in fact of well-defined structure (closed world), or we have
no idea (open world)?

Janna Hastings

unread,
Nov 8, 2012, 6:52:55 AM11/8/12
to cheminf-...@googlegroups.com


you need a "closure axiom" in OWL to close your world for you:  :AorB EquivalentTo (:A or :B)

Egon Willighagen

unread,
Nov 8, 2012, 7:17:20 AM11/8/12
to cheminf-...@googlegroups.com
On Thu, Nov 8, 2012 at 12:52 PM, Janna Hastings
<janna.h...@gmail.com> wrote:
> you need a "closure axiom" in OWL to close your world for you: :AorB
> EquivalentTo (:A or :B)

I guess that will work for tautomers, or at least in any situation
where we know :A or :B (and then we might as well define them as
subclasses)... but not for "measured metabolite" where the number of
options grows very quickly...

But what is clear to me, is that I can just subclass molecular entity
and just define a class just like I want it...

:PartiallyDefinedMolecularEntity rdfs:subClassOf :MolecularEntity ;
dc:description "A molecular entity of well-defined but not fully
defined structural elements, including constitutional, and
stereochemical chemistry. As such, it can be one of the (limited
number of) subclasses with exact chemical structure."

A third use case would be:

:leucine rdfs:subClassOf :PartiallyDefinedMolecularEntity .
:DLeucine rdfs:subClassOf :leucine .
:LLeucine rdfs:subClassOf :leucine .

The latter two may just not be defined or not communicated. The number
of those triples may be really large... with the
:PartiallyDefinedMolecularEntity I know that some structural
information is not unknown but not defined for this entity...

Janna Hastings

unread,
Nov 8, 2012, 7:25:36 AM11/8/12
to cheminf-...@googlegroups.com
Egon, I think you are trying to slightly mis-use the classification hierarchy. A partially defined molecular entity is not a special subclass of (type of) molecular entity, it is just a molecular entity. There is no need to state that it could be one of the subclasses, that is absolutely what the existence of subclasses means in any case. As long as the class you choose is not a leaf node in the classification hierarchy, using it to denote the type of an entity implies that it is not fully defined.

In the case of leucine vs. d-leucine and l-leucine, you again just have a classification hierarchy: d-leucine subClassOf leucine, l-leucine subClassOf leucine, and the additional closure axiom can help you to infer that you must have one or the other of these leucine equvalentTo (d-leucine or l-leucine)

There is no need for the additional class 'PartiallyDefined...' for anything. You just choose the appropriate parent class that matches the amount of information that you have about that entity, and whether your knowledge is complete will be reflected by the classification depth that you are able to assert in the hierarchy.  If you want to include some level of provenance about the association of a given bit of data to the chemical classification hierarchy, that is a metadata requirement not a classification requirement.

Hope this helps,
Janna

Egon Willighagen

unread,
Nov 8, 2012, 7:32:48 AM11/8/12
to cheminf-ontology
On Thu, Nov 8, 2012 at 1:25 PM, Janna Hastings <janna.h...@gmail.com> wrote:
> Egon, I think you are trying to slightly mis-use the classification
> hierarchy. A partially defined molecular entity is not a special subclass of
> (type of) molecular entity, it is just a molecular entity. There is no need
> to state that it could be one of the subclasses, that is absolutely what the
> existence of subclasses means in any case. As long as the class you choose
> is not a leaf node in the classification hierarchy, using it to denote the
> type of an entity implies that it is not fully defined.

So, what happens if molecular entity does not have the subclasses?
That makes it a leave, but not necessarily fully specified...

Consider a metabolite with formula C5H9N3 ... you will not list all
possible molecular entities that match that structure...

Janna Hastings

unread,
Nov 8, 2012, 8:08:32 AM11/8/12
to cheminf-...@googlegroups.com
you don't need to list the subclasses, because of the open world!

Egon Willighagen

unread,
Nov 8, 2012, 8:27:09 AM11/8/12
to cheminf-ontology
On Thu, Nov 8, 2012 at 2:08 PM, Janna Hastings <janna.h...@gmail.com> wrote:
> you don't need to list the subclasses, because of the open world!

But then the 'class' becomes a leave again, and I can longer
distinguish it from something like :Dleucine anymore... taking me back
to the start.

Janna Hastings

unread,
Nov 8, 2012, 8:34:21 AM11/8/12
to cheminf-...@googlegroups.com
I think what you're trying to do confuses the subclass hierarchy with a bit of knowledge that really needs to be added as an annotation.  For ChEBI, we use the presence or absence of an InChI annotation as a heuristic to decide on whether or not a molecular class is "fully" specified in the structural sense.  You could add an annotation in CHEMINF specifically to deal with the case you describe. But it isn't a new parallel classification hierarchy.
Reply all
Reply to author
Forward
0 new messages