physical properties in CHEMINF

33 views
Skip to first unread message

Gang Fu

unread,
May 5, 2014, 9:59:47 AM5/5/14
to cheminf-...@googlegroups.com
Hi All,

PubChemRDF wants to expose the physical properties collected from various sources. Several of them have been defined in CHEMINF, including melting point, boiling point, water solubility, and so on. But most of environmental safety properties have not been defined in CHEMINF, like exposure limit. I have made a draft of specifications for the physical properties we have collected. Do you think they can be defined in CHEMINF?

Best,
Gang
PubChemRDF_Pysicalproperties.pdf

Michel Dumontier

unread,
May 6, 2014, 12:40:15 AM5/6/14
to cheminf-...@googlegroups.com
Hi Gang,
  yes, I think it's a good idea :)

m.

Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group


--
You received this message because you are subscribed to the Google Groups "cheminf-ontology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cheminf-ontolo...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Janna Hastings

unread,
May 6, 2014, 4:21:59 AM5/6/14
to cheminf-...@googlegroups.com
Hi,

absolutely! I'll try to have them added by the end of this week.

Cheers, Janna

Janna Hastings

unread,
May 8, 2014, 7:03:50 AM5/8/14
to cheminf-...@googlegroups.com
Dear all,

I have added (most of) the new physical property descriptors as follows:

freezing point: CHEMINF_000432
Henry's law constant: CHEMINF_000433
atmospheric OH rate constant: CHEMINF_000434
upper explosive limit: CHEMINF_000435
lower explosive limit: CHEMINF_000436
minimum explosive concentration: CHEMINF_000437
specific gravity: CHEMINF_000438
relative density: CHEMINF_000439
vapor density: CHEMINF_000440
odor threshold: CHEMINF_000441
pH: CHEMINF_000442
evaporation rate: CHEMINF_000443
auto-ignition temperature: CHEMINF_000444
soil half-life: CHEMINF_000445

The following items in the list were not added because I believe that they are already in PATO, and better belong there:

appearance (physical state, color etc)
viscosity

Could the other CHEMINF developers please take a look and sanity check etc.?

Thanks!
Janna

Egon Willighagen

unread,
May 8, 2014, 8:54:04 AM5/8/14
to cheminf-ontology
On Thu, May 8, 2014 at 1:03 PM, Janna Hastings <janna.h...@gmail.com> wrote:
> I have added (most of) the new physical property descriptors as follows:
>
> atmospheric OH rate constant: CHEMINF_000434
> soil half-life: CHEMINF_000445

These two actually sound close to the ecotox data from Nijmegen we
might have a go at in eNanoMapper...

OK, it seems defined as for a substance, so that should work fine.

I'll go through the list and report if I find things I don't get...

Egon

--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286

Egon Willighagen

unread,
May 8, 2014, 9:13:05 AM5/8/14
to cheminf-ontology
Janna, all,

On Thu, May 8, 2014 at 1:03 PM, Janna Hastings <janna.h...@gmail.com> wrote:
> freezing point: CHEMINF_000432

So, what happens when we define a freezing point as a non-standard
condition? With the current definition there, it suggest that
CHEMINF_000432 is a direct node below physical descriptor... should
there not be a general freezing point in between, for unspecified
conditions allows this one, but also other subclasses to subdefine a
freezing point *with* specific conditions? Or are there other means of
squeezing in a superclass, to ensure we know that two freezing points
at different pressures are actually more close to each other than both
being a physical descriptor?

(I have no clue what the ontology conventions are here that CHEMINF is
subscribing too...)

> specific gravity: CHEMINF_000438

I was wondering about the description... it effectively refers to
other concepts... what are the practices here? Should descriptions
include URIs to matching terms from that ontology, or other ontology?

Specifically, this description writes "The reference substance is
usually water for liquids or air for gases." ... Water is hard to get
wrong, though I am sure some will, but "air" certainly could use some
clarification... does it make sense to refer to CHEMINF/ChEBI/foo term
for air?

(Here too, I have no clue what the ontology conventions are here that
CHEMINF is subscribing too...)

Nina Jeliazkova

unread,
May 8, 2014, 9:24:32 AM5/8/14
to cheminf-...@googlegroups.com


On 8 May 2014 16:13, "Egon Willighagen" <egon.wil...@gmail.com> wrote:
>
> Janna, all,
>
> On Thu, May 8, 2014 at 1:03 PM, Janna Hastings <janna.h...@gmail.com> wrote:
> > freezing point: CHEMINF_000432
>
> So, what happens when we define a freezing point as a non-standard
> condition?

I'll take the change to reply here, as my opinion is no property should be defined per se, without specifying the exact condition it is measured at (including protocols). The same applies to calculated properties.

I don't know if Cheminf already does so, apologies if this is already the approach used.

Best regards,
Nina

Janna Hastings

unread,
May 8, 2014, 9:25:26 AM5/8/14
to cheminf-...@googlegroups.com
Hi,


On Thu, May 8, 2014 at 2:13 PM, Egon Willighagen <egon.wil...@gmail.com> wrote:
Janna, all,

On Thu, May 8, 2014 at 1:03 PM, Janna Hastings <janna.h...@gmail.com> wrote:
> freezing point: CHEMINF_000432

So, what happens when we define a freezing point as a non-standard
condition? With the current definition there, it suggest that
CHEMINF_000432 is a direct node below physical descriptor... should
there not be a general freezing point in between, for unspecified
conditions allows this one, but also other subclasses to subdefine a
freezing point *with* specific conditions? Or are there other means of
squeezing in a superclass, to ensure we know that two freezing points
at different pressures are actually more close to each other than both
being a physical descriptor?

Of course, a parent term would be needed in this scenario. The policy I would recommend here would be not to introduce the parent term until you have a real annotation need for multiple versions of the freezing point (and similarly for other parameters that are usually estimated or measured at "standard conditions" of one type or another).
 

> specific gravity: CHEMINF_000438

I was wondering about the description... it effectively refers to
other concepts... what are the practices here? Should descriptions
include URIs to matching terms from that ontology, or other ontology?

Specifically, this description writes "The reference substance is
usually water for liquids or air for gases." ... Water is hard to get
wrong, though I am sure some will, but "air" certainly could use some
clarification... does it make sense to refer to CHEMINF/ChEBI/foo term
for air?

Some ontologies do this by including the term ID for the referenced entity in brackets after the word, e.g. The reference substance is usually water (CHEBI:15377) for liquids or air for gases.

I agree that "air" is tricky, but defining this is not in the scope of CHEMINF. I wouldn't have thought it belonged in ChEBI either. Probably the best reference would be the Environment Ontology (ENVO): ENVO_00002005, air, defined as "The mixture of gases (roughly (by molar content/volume: 78% nitrogen, 20.95% oxygen, 0.93% argon, 0.038% carbon dioxide, trace amounts of other gases, and a variable amount (average around 1%) of water vapor) that surrounds the planet Earth." ENVO also has "soil".

Putting the xrefs into the definition  like this has been adopted by some other ontologies, but so far CHEMINF hasn't done that. Not sure we should start yet?

Cheers, Janna


 


Janna Hastings

unread,
May 8, 2014, 9:28:51 AM5/8/14
to cheminf-...@googlegroups.com
Hi Nina,

I wholeheartedly agree with you in principle, but some of that detail should be specified at a lower level or even as metadata associated to the instances or annotations. The ontology should provide useful classes that capture more general attributes, properties etc. Just as a way to bridge across different databases.

Cheers, Janna


Nina Jeliazkova

unread,
May 8, 2014, 9:34:12 AM5/8/14
to cheminf-...@googlegroups.com
Hi Janna,

I can understand your point, but if an ontology is to be close description of reality, then there is no single property that is independent of conditions / experiments / calculations. 

 If we are not able to use the ontology terms to describe the measurements as in different databases in sufficient detail, than bridging becomes quite difficult, as it will be hard to find find out /state if the entities in different databases are really the same. And of course all databases will resort to their own solution.  There are ISO standards for describing all these properties, would be good if CHEMINF is somehow close to these.

Best regards,
Nina

Janna Hastings

unread,
May 8, 2014, 9:45:41 AM5/8/14
to cheminf-...@googlegroups.com
Hi Nina,

I don't believe that I said we shouldn't use the ontology to annotate the properties in different databases. Obviously that is the whole point of having the ontology.

My point is just about using the hierarchy to capture information in the right place. Take a look at this:

http://semanticscience.org/resource/CHEMINF_000312

There is a general class for "rule of five violation descriptor". Then there are two subclasses for calculations of the rule of five by different libraries, one of which refers to a version (because this was known by the class requester). You can't see it on this online page, but in the ontology there is further axiom stating "'is output of' some 'execution of ACD/Labs PhysChem software library version 12.01'" which links to a class for that software library, which is then linked to the other descriptors it can calculate. All of this is just about the *type* of the descriptor, there is additional information about which unit it is expressed in, which you would capture at the level of the *annotation* which would follow something like the pattern:

X type: <some cheminf descriptor type, as specific as possible>
X has-units <some unit from a standard e.g. UO>
X has-value <the value>
and if needed, something like X has-source <some other database, or webpage, or whatever>

And we will add to all this if we find the patterns we are already using are insufficient for the data we need to annotate.

Cheers, Janna

 

Egon Willighagen

unread,
May 8, 2014, 9:56:25 AM5/8/14
to cheminf-ontology
Janna, Nina,

On Thu, May 8, 2014 at 3:24 PM, Nina Jeliazkova
<jeliazk...@gmail.com> wrote:
> I'll take the change to reply here, as my opinion is no property should be
> defined per se, without specifying the exact condition it is measured at
> (including protocols). The same applies to calculated properties.

This will be critical for the "zeta potential"... this is critically
dependent on the pH under which it was measured (going from very
positive to very negative...).

This will be a challenge... as we must report this zeta potential
*with* the pH... (and, yes, literature often fails to do this in the
past few years...)

Nina Jeliazkova

unread,
May 8, 2014, 9:57:27 AM5/8/14
to cheminf-...@googlegroups.com
Hi Janna,


On 8 May 2014 16:45, Janna Hastings <janna.h...@gmail.com> wrote:
Hi Nina,

I don't believe that I said we shouldn't use the ontology to annotate the properties in different databases. Obviously that is the whole point of having the ontology.

My point is just about using the hierarchy to capture information in the right place. Take a look at this:

http://semanticscience.org/resource/CHEMINF_000312

That's fine. 


There is a general class for "rule of five violation descriptor". Then there are two subclasses for calculations of the rule of five by different libraries, one of which refers to a version (because this was known by the class requester). You can't see it on this online page, but in the ontology there is further axiom stating "'is output of' some 'execution of ACD/Labs PhysChem software library version 12.01'" which links to a class for that software library, which is then linked to the other descriptors it can calculate. All of this is just about the *type* of the descriptor, there is additional information about which unit it is expressed in, which you would capture at the level of the *annotation* which would follow something like the pattern:

X type: <some cheminf descriptor type, as specific as possible>
X has-units <some unit from a standard e.g. UO>
X has-value <the value>
and if needed, something like X has-source <some other database, or webpage, or whatever>

And we will add to all this if we find the patterns we are already using are insufficient for the data we need to annotate.

I'll be sending you large number of examples on eNanoMapper list next week :)   But my impression is there will be huge explosion of classes, if I correctly understood  the approach in CHEMINF is to define classes for every single combination of measurement protocol and varying conditions. Apologies if I misunderstood.

Nina

Janna Hastings

unread,
May 8, 2014, 10:00:29 AM5/8/14
to cheminf-...@googlegroups.com
Hi Nina,

We can choose when we need a new class and when we can use a generic parent class together with additional information captured using the relevant relations at the level of the annotation. This is the different between pre-composition and post-composition. Most active ontologies in use, e.g. the Gene Ontology, use a combination approach. So far, I am not worried yet about the number of classes we'll end up adding.

Cheers, Janna

Nina Jeliazkova

unread,
May 8, 2014, 10:00:35 AM5/8/14
to cheminf-...@googlegroups.com
Egon,




On 8 May 2014 16:56, Egon Willighagen <egon.wil...@gmail.com> wrote:
Janna, Nina,

On Thu, May 8, 2014 at 3:24 PM, Nina Jeliazkova
<jeliazk...@gmail.com> wrote:
> I'll take the change to reply here, as my opinion is no property should be
> defined per se, without specifying the exact condition it is measured at
> (including protocols). The same applies to calculated properties.

This will be critical for the "zeta potential"... this is critically
dependent on the pH under which it was measured (going from very
positive to very negative...).

 

This will be a challenge... as we must report this zeta potential
*with* the pH... (and, yes, literature often fails to do this in the
past few years...)



This is my point, it is critical not only for zeta potential, but for _any_ measured property.  Physical state / appearance changes with temperature and pressure; soil fate depends on the soil - there are artificial soils , natural one coming from different environments, you name it.   Biodegradation depends on the protocol, inoculum, what not. 

Regards,
Nina

 
Egon

--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286

Gang Fu

unread,
May 11, 2014, 9:39:42 PM5/11/14
to cheminf-...@googlegroups.com
Hi,

Here is an example how PubChem use CHEMINF to expose calculated chemical descriptors:
http://pubchem.ncbi.nlm.nih.gov//rdf/#_Toc376426172
We also have provenance metadata associated with each instance.

For experimental physical properties, I believe we can expose them in the same way, the experimental condition and other metadata can be  annotated at instance level instead of class level.

Here is a little bit more info about the data collected by PubChem group:
1. EPA (environmental protection agent) data:
    number of records: 67,253
2. ILO (international labor organization) data:
    number of records: 1,702
3. CDC NIOSH data:
    number of records: 678
4. OSHA (occupational chemical database) data:
    number of records: 800
5. NPIC (national pesticide information center) data:
    number of records: 342
6. food additives:
    number of records: 144


@Janna,

We are increasing the list now, in the near future we will have "stability", "optical rotation", and others. I will send you an updated list later on.

We can use SIO to expose metadata as suggested by Michel, such as sio:cites, sio:has-source.

@Michel,

Do you think we can add some term in SIO to expose experimental conditions?

Thank you all for the useful discussion!

Best,
Gang



Gang Fu

unread,
May 11, 2014, 9:43:07 PM5/11/14
to cheminf-...@googlegroups.com
Hi All,

All the definition I collected are either from the original sources, or from Wiki. It would be great if we can align the definitions of those physical properties with ISO standards. But I don't know how to collect ISO definitions, are there any useful links?

Best,
Gang

Michel Dumontier

unread,
May 12, 2014, 1:59:48 AM5/12/14
to cheminf-...@googlegroups.com
Gang,
  I can certainly extend SIO if required. please post your request to the SIO mailing list, and we can happily discuss further.

m.


Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group


Gang Fu

unread,
Jul 2, 2014, 4:07:05 PM7/2/14
to cheminf-...@googlegroups.com
Hi All,

Please consider adding the optical rotation and solubility to CHEMINF. Here are some original data about National Cancer Institute (NCI) investigational drugs (http://dtp.nci.nih.gov/nci_InvestigationalDrugs_PI.html)

The experimental solubility can be measured in different solutions, water solubility is one of them. I believe we have water solubility (CHEMINF_000258) defined in CHEMINF, can we have more general term like solubility that is superclass of water solubility and it may have other subclass for other solutions?

Regarding to optical rotation, the experimental data can be collected in different conditions, like massconcentration, wavelength, temperature and so on. Shall we make the distinctions at class level or instance level for different measurement conditions?

One more concern is about the complexity of data values, which may be range, rather than explicit number, and may be associated with standard deviations. How can we express this information?

Thank you very much!

Best,
Gang

Colin Batchelor

unread,
Jul 4, 2014, 6:40:08 AM7/4/14
to cheminf-...@googlegroups.com
Hi Gang,

Because these are experimental rather than cheminformatics classes, I've looked carefully at the IUPAC Gold Book and added some classes to CHMO (the Chemical Methods Ontology), cross-referencing them with techniques and in the case of "angle of optical rotation", "optical activity", which is the disposition.

I imagine it would be worthwhile to have a class for measurement under standard conditions. For non-standard conditions it would be a bit more involved. Can add the standard conditions for measuring angle of optical rotation in the next commit.

"angle of optical rotation" is http://purl.obolibrary.org/obo/CHMO_0002818

"solubility" is http://purl.obolibrary.org/obo/CHMO_0002815

"solubility in water" is http://purl.obolibrary.org/obo/CHMO_0002825

You will also find the different sorts of concentration (mass concentration, amount concentration) in there.

It *should* update on Bioportal: http://bioportal.bioontology.org/ontologies/CHMO/?p=classes&conceptid=root and Ontobee later on today. I'm taking the opportunity to rename some files that still had the old names from when we had an ontology prefix clash so there might be a wee bit of discontinuity for a few hours.

Best wishes,
Colin.

Gang Fu

unread,
Jul 7, 2014, 8:44:31 AM7/7/14
to cheminf-...@googlegroups.com
Thank you very much, Colin!

CHMO is indeed a good resource to define experimental data, including the spectra data like UV spectrum and HPCL. We can find those data for NCI investigational drugs. Do you have any documentation or manuscript to explain how to use the ontology?

Exposing experimental data is challenging, given a variety of experimental conditions...

I have looked at the ontology and found under the class "information content entity", there are "experimental method output" and "data item". I guess the experimental conditions go to the "equipment specification datum" and "method specification datum", right? What  would the RDF model look like? For instance, Optical Rotation: (c = 1, H2O) [a]20 D = 40.0 ± 1.0° 
How to say the concentration and temperature condition in RDF?

Best,
Gang

Gang Fu

unread,
Aug 3, 2016, 10:54:27 AM8/3/16
to cheminf-ontology
Hi Janna,

I found a couple of more physical properties are missing in CHEMINF, CHMO, and PATO, do you think these can be added in CHEMINF?
LogS: the base-10 logarithm of the aqueous solubility of a compound
pKa:The negative logarithm of the acid dissociation constant
Vapor pressure:Vapor pressure is the pressure of a vapor in thermodynamic equilibrium with its condensed phases in a closed system.
Dissociation Constants: A specific type of equilibrium constant that measures the propensity of a larger object to separate (dissociate) reversibly into smaller components, as when a complex falls apart into its component molecules, or when a salt splits up into its component ions.


Thank you very much!

Best,
Gang

Janna Hastings

unread,
Aug 4, 2016, 2:24:54 PM8/4/16
to cheminf-...@googlegroups.com, Egon Willighagen, Fu, Gang (NIH/NLM/NCBI) [F]
Hi Gang,

I'm going to have to defer to Egon (CC'ed) on this one -- hopefully he can add them?

I'm currently home on maternity leave and the little one really doesn't let me get anything even remotely productive done!

Best wishes,
Janna

To unsubscribe from this group and stop receiving emails from it, send an email to cheminf-ontology+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages