Chemistry/CML in FoX

29 views
Skip to first unread message

petermr

unread,
Oct 28, 2011, 8:10:58 PM10/28/11
to FoX-discuss, Wibe....@pnl.gov
This message is to greet the FoXen and to bring you up-to-date with
developments in CML.

Firstly I am delighted to see that Andrew has taken on FoX and created
a new site, mailing, list, etc. This is very much appreciated and is a
great hep to maintaing momentum and morake in a project.

As you may know FoX arose out of our collaboration between Chemistry
and Materials/Minerals and was driven by the need to capture
computational solid state chemistry, especially Quantum Mechanics. CML
supports that and I believe that it can cater for a wide range of
applications. FoX can cater for many, but not all of them in WCML.

I am now visiting Pacific Northwest Lab in Washington State, US and
specifically working on NWChem - the Open Source QM program suite.
This is very exciting as we are intending to FoXify the code.

The original FoX was written for materials/minerals so primary
contained libraries for solid state. For example there is no way in
WCML of adding bonds (which I think the original authors thought were
heresy - bonds do not exist). I am interested in having bonds in WCML
and I'd be interested to know what other chemistry people would like
supported (this is not an offer of coding, but at least of advice).

CML has roughly 5 main subbranches - molecules, reactions, solidstate,
spectra and computational. This means that many complex systems can be
represented (e.g. molecular spectroscopy, solid-state reactions, etc.)
The particular combination of allowed CMLElements is determined by a
*convention*. In fact WCML is based on the CMLComp convention
developed by Toby, AndrewWs and colleagues.

There is now a sound basis for developing bespoke conventions. We are
developing compchem as the default convention for molecular
calculations (but it may also work for solid state). It has a defined
structure for the overall files/documents and also constraints on
molecules etc. If there is interest I can elaborate, but here is the
definitive paper:

CMLLite: a design philosophy for CML
Joe A. Townsend email and Peter Murray-Rust
Journal of Cheminformatics 2011, 3:39doi:10.1186/1758-2946-3-39
http://www.jcheminf.com/content/3/1/39

CML also relies on dictionaries and again we have made considerable
progress including dictionary validation tools:

The semantics of Chemical Markup Language (CML): dictionaries and
conventions
Peter Murray-Rust email, Joe A Townsend email, Sam E Adams email,
Weerapong Phadungsukanan email and Jens Thomas
Journal of Cheminformatics 2011, 3:43doi:10.1186/1758-2946-3-43
http://www.jcheminf.com/content/3/1/43

This should also be valuable for people building FoX applications in
Chemistry and perhaps other physical science applications.

FWIW I am using Windows and have successfully compiled FoX out of the
box in half a minute. Thanks

Andrew Walker

unread,
Oct 29, 2011, 8:31:32 AM10/29/11
to fox-d...@googlegroups.com
Thanks Peter,

I cannot take all the credit - Toby sorted out this mailing list when we
were still in Cambridge and all I've really done is try and keep on top of
the bug reports, the odd feature request, and make occasional releases.

Actually putting bonds into WCML wouldn't be a problem - probably only an
evening's coding - getting the semantics and syntax right in CMLComp may
be more difficult and this informs the API design. Use cases that probably
need covering include:

* Bonds as input in molecular mechanics / parametrized potentials codes
where the bond forms part of the force-field.

* Bonds as constraints as part of a Z-matrix (maybe these can be ignored).

* 'Bonds' that come out of a calculation based on some distance-based
criteria. In a QM code these may be decorated with electron density from
some kind of population analysis.

* Bonds that come out of a calculation based on some other objective
scheme. I'm thinking about Bader-type analysis (atoms-in-molecules) but
there are probably others.

I can imagine a calculation (e.g. QM/MM with Z-matrix constraints and
analysis of the resulting electron density) where all four of these are
simultaneously used. We probably need to decide what a bond should look
like in a CMLComp document.

Cheers,

Andrew

> --
> You received this message because you are subscribed to the Google Groups
> "FoX-discuss" group.
> To post to this group, send email to fox-d...@googlegroups.com.
> To unsubscribe from this group, send email to
> fox-discuss...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/fox-discuss?hl=en.
>
>


--


Peter Murray-Rust

unread,
Oct 29, 2011, 9:05:42 AM10/29/11
to fox-d...@googlegroups.com


On Sat, Oct 29, 2011 at 1:31 PM, Andrew Walker <Andrew...@bristol.ac.uk> wrote:
Thanks Peter,



Actually putting bonds into WCML wouldn't be a problem - probably only an
evening's coding - getting the semantics and syntax right in CMLComp may
be more difficult and this informs the API design.

Yes. The tools for managing conventions are at
http://www.xml-cml.org
This has a convention for compchem which validates files. It would be possible to create another for CMLComp but that would require writing XML/XSLT constraints. It needs a "CMLComp guru" that I could interact with

The conventions approach can apply to any XML application. It's not trivial to agree on rules but it makes it much easier to write downstream software.
 
Use cases that probably
need covering include:

 

* Bonds as input in molecular mechanics / parametrized potentials codes
where the bond forms part of the force-field.

Yes. Note that CML also has angle and torsion elements

* Bonds as constraints as part of a Z-matrix (maybe these can be ignored).

I have recently shown that we can extract Z-matrix stuff from Gaussian and convert to CML and generate a molecule from it
 
* 'Bonds' that come out of a calculation based on some distance-based
criteria. In a QM code these may be decorated with electron density from
some kind of population analysis.

This is common and should be straightforward

* Bonds that come out of a calculation based on some other objective
scheme. I'm thinking about Bader-type analysis (atoms-in-molecules) but
there are probably others.

Possible - could be decorated with non-CML annotation

I can imagine a calculation (e.g. QM/MM with Z-matrix constraints and
analysis of the resulting electron density) where all four of these are
simultaneously used. We probably need to decide what a bond should look
like in a CMLComp document.

I think the original authors felt that bonds were the work of heretics. "Nothing exists except atoms and empty space; all else is opinion" (Democritos).

Many of these may arise from our NWChem work anyway

P.
 



--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Andrew Walker

unread,
Oct 29, 2011, 1:18:41 PM10/29/11
to fox-d...@googlegroups.com
Hi Peter,

I don't think anybody has done any development work on CMLComp since the
end of the eMinerals project. As far as I can see the old cmlcomp.org
website (the relaxNG schema, validator, and design documents) is gone.
Some of the site still exists on the web archive:

http://web.archive.org/web/20080828103020/http://cmlcomp.org/

but I'm not sure that everything is there. Anyway, if you can send me a
first pass at what you want a bond in a document generated by WCML to look
like I'll try and draft the FoX API and remember what I used to know about
the details of CMLComp.

Cheers,

Andrew

Peter Murray-Rust

unread,
Oct 29, 2011, 1:48:29 PM10/29/11
to fox-d...@googlegroups.com


On Sat, Oct 29, 2011 at 6:18 PM, Andrew Walker <Andrew...@bristol.ac.uk> wrote:
Hi Peter,



but I'm not sure that everything is there. Anyway, if you can send me a
first pass at what you want a bond in a document generated by WCML to look
like I'll try and draft the FoX API and remember what I used to know about
the details of CMLComp.

In which case I think it's useful to create the CMLComp1.1 convention (or even develop compchem so it subsumes CMLComp). This would have constraints like

 * every CMLComp must have the toplevel element = <module dictRef="compchem:jobList">
 * the jobList can only contain <module dictRef="compchem:job"> or <module dictRef="compchem:environment"> (we have tightened up the syntax)

a molecule can only contain a mandatory atomArray and optional bondArray or ... foreign namespaced elements

This level of constraint makes it much easier for programmers to know what they may and may not expect.

P.

Andrew Walker

unread,
Nov 1, 2011, 4:27:02 PM11/1/11
to fox-d...@googlegroups.com
Hi Peter,

As far as I can see WCML already able to do everything needed in your
CMLComp1.1 convention other than the optional bond array within a
molecule. Let me know if this isn't the case.

Two things we cannot do is make backwards incompatible changes to the
current WCML API or change the XML that gets produced by using the current
API. For example, Siesta uses WCML to produce CMLComp documents as a key
part of the Siesta test harness. (There are a bunch of other atomic scale
codes that use FoX.) Do you have any view on what the WCML subroutines to
add the bond list should look like?

Cheers,

Andrew


On 29 Oct 2011, at 18:48, Peter Murray-Rust wrote:

On Sat, Oct 29, 2011 at 6:18 PM, Andrew Walker wrote:
Hi Peter,


but I'm not sure that everything is there. Anyway, if you can send me a
first pass at what you want a bond in a document generated by WCML to look
like I'll try and draft the FoX API and remember what I used to know about
the details of CMLComp.

In which case I think it's useful to create the CMLComp1.1 convention (or
even develop compchem so it subsumes CMLComp). This would have constraints
like

* every CMLComp must have the toplevel element =

* the jobList can only contain or (we have tightened up the syntax)

a molecule can only contain a mandatory atomArray and optional bondArray
or ... foreign namespaced elements

This level of constraint makes it much easier for programmers to know what
they may and may not expect.

P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

--

You received this message because you are subscribed to the Google Groups
"FoX-discuss" group.
To post to this group, send email to fox-d...@googlegroups.com. To
unsubscribe from this group, send email to
fox-discuss...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/fox-discuss?hl=en.

--

Andrew Walker
http://www1.gly.bris.ac.uk/~walker/

Department of Earth Sciences,
University of Bristol,
Wills Memorial Building,
Queenï¿œs Road,
Bristol, BS8 1RJ, UK

Peter Murray-Rust

unread,
Nov 1, 2011, 5:01:12 PM11/1/11
to fox-d...@googlegroups.com
On Tue, Nov 1, 2011 at 8:27 PM, Andrew Walker <Andrew...@bristol.ac.uk> wrote:
Hi Peter,

As far as I can see WCML already able to do everything needed in your
CMLComp1.1 convention other than the optional bond array within a
molecule. Let me know if this isn't the case.

I think that WCML can do anything XML so it can create <foo:bar plugh="z"/>. So it could create bonds. 

Two things we cannot do is make backwards incompatible changes to the
current WCML API or change the XML that gets produced by using the current
API. For example, Siesta uses WCML to produce CMLComp documents as a key
part of the Siesta test harness. (There are a bunch of other atomic scale
codes that use FoX.)

This is gratifying to know!
 
Do you have any view on what the WCML subroutines to
add the bond list should look like?

Something like:
EITHER
cmlAddBond  // must come after cmlAddMolecule
(atom1) string scalar: identifier of first atom (an existing atomid)
(atom2) string scalar: identifier of second atom (an existing atomid)
(order) string scalar . type of bond (S,B,T and a few more [optional]

OR additional fields in cmlAddMolecule
(atom1) array scalar: identifiers of first atoms (existing atomids)
(atom2) array scalar: identifiers of second atoms (existing atomids)
(order) array scalar . type of bond (S,B,T and a few more) [optional]

So really very simple for simple cases.

If you want to add tricky stuff to a bond (e.g. atomStereo children) that is probably outside WCML at present (I could do it, but it's better done outside the fortran)

Does that make sense?

Andrew Walker

unread,
Nov 2, 2011, 5:28:25 AM11/2/11
to fox-d...@googlegroups.com

On 1 Nov 2011, at 21:01, Peter Murray-Rust wrote:

>
>
> On Tue, Nov 1, 2011 at 8:27 PM, Andrew Walker <Andrew...@bristol.ac.uk> wrote:
> Hi Peter,
>
> As far as I can see WCML already able to do everything needed in your
> CMLComp1.1 convention other than the optional bond array within a
> molecule. Let me know if this isn't the case.
>
> I think that WCML can do anything XML so it can create <foo:bar plugh="z"/>. So it could create bonds.

That's only true of WXML (able to produce any well-formed XML document). WCML can only produce particular fragments of (usually) valid CML. Can you produce a CMLComp1.1 document without using any WXML functions?

> Two things we cannot do is make backwards incompatible changes to the
> current WCML API or change the XML that gets produced by using the current
> API. For example, Siesta uses WCML to produce CMLComp documents as a key
> part of the Siesta test harness. (There are a bunch of other atomic scale
> codes that use FoX.)
>
> This is gratifying to know!
>
> Do you have any view on what the WCML subroutines to
> add the bond list should look like?
>
> Something like:
> EITHER
> cmlAddBond // must come after cmlAddMolecule
> (atom1) string scalar: identifier of first atom (an existing atomid)
> (atom2) string scalar: identifier of second atom (an existing atomid)
> (order) string scalar . type of bond (S,B,T and a few more [optional]
>
> OR additional fields in cmlAddMolecule
> (atom1) array scalar: identifiers of first atoms (existing atomids)
> (atom2) array scalar: identifiers of second atoms (existing atomids)
> (order) array scalar . type of bond (S,B,T and a few more) [optional]
>
> So really very simple for simple cases.

Ok - that looks easy enough. I guess the code should check that all bond array atomids are defined within the atom array.

>
> If you want to add tricky stuff to a bond (e.g. atomStereo children) that is probably outside WCML at present (I could do it, but it's better done outside the fortran)
>
> Does that make sense?

Yes

>
>
> --
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
>

> --
> You received this message because you are subscribed to the Google Groups "FoX-discuss" group.
> To post to this group, send email to fox-d...@googlegroups.com.
> To unsubscribe from this group, send email to fox-discuss...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/fox-discuss?hl=en.

--

Andrew Walker <andrew...@bris.ac.uk>
http://www1.gly.bris.ac.uk/~walker/

Department of Earth Sciences,
University of Bristol,
Wills Memorial Building,

Queen’s Road,
Bristol, BS8 1RJ, UK

Peter Murray-Rust

unread,
Nov 2, 2011, 8:03:55 AM11/2/11
to fox-d...@googlegroups.com, DeJong, Wibe A
On Wed, Nov 2, 2011 at 9:28 AM, Andrew Walker <andrew...@bristol.ac.uk> wrote:

On 1 Nov 2011, at 21:01, Peter Murray-Rust wrote:

>
> I think that WCML can do anything XML so it can create <foo:bar plugh="z"/>. So it could create bonds.

That's only true of WXML (able to produce any well-formed XML document). WCML can only produce particular fragments of (usually) valid CML. Can you produce a CMLComp1.1 document without using any WXML functions?

If the bond stuff were written... bonds are the only thing that I think there would be a significant demand. Even they, of course, may onely be calculated at a late stage by geometry or (say) mulliken populations.

> Do you have any view on what the WCML subroutines to
> add the bond list should look like?
>
> Something like:
> EITHER
> cmlAddBond  // must come after cmlAddMolecule
> (atom1) string scalar: identifier of first atom (an existing atomid)
> (atom2) string scalar: identifier of second atom (an existing atomid)
> (order) string scalar . type of bond (S,B,T and a few more [optional]
>
> OR additional fields in cmlAddMolecule
> (atom1) array scalar: identifiers of first atoms (existing atomids)
> (atom2) array scalar: identifiers of second atoms (existing atomids)
> (order) array scalar . type of bond (S,B,T and a few more) [optional]
>
> So really very simple for simple cases.

Ok - that looks easy enough. I guess the code should check that all bond array atomids are defined within the atom array.

The various constrainst would, I think be:
 * atom ids exist
 * the two atoms in a bond are distinct
 * a bond is not added twice. It may be useful to have an id for the bond (I normally do and form it out of the atomids

<bond id="a1_a2" atomRefs2="a1 a2" order="S"/>

and you can check the ids
 
I think addition of bonds would not break back compatibility unless people throw an error if there are bonds.

We are making very good progress with NWChem. One question:

In some cases the program exits on error in the "middle" of a module. A typical case is time limit. This can mean unclosed elements:

<cml>
  <module id="job">
    <module id="optimise">
      <!-- some useful results here -->
      <scalar>Oh dear run out of time</scalar>

This is not well-formed XML. Do people have strategies for closing this either in FoX or subsequently so that analysis programs can read the XML?

Have copied Bert - the guru of NWChem

Andrew Walker

unread,
Nov 2, 2011, 1:10:33 PM11/2/11
to fox-d...@googlegroups.com, DeJong, Wibe A

No way of fixing this in FoX, but Toby White wrote a python script (distributed as part of ccVis) to sort out exactly this problem. It's attached. It turns out you need an XML parser with quite decent error reporting, an ability to rewind over the last handful of characters of the document after an error occurred and the patience to list all the ways that an XML document could be truncated.

We just used to use this at the start of any XML processing pipeline.

Cheers,

Andrew


finishXMLfile.py

Peter Murray-Rust

unread,
Nov 2, 2011, 1:24:01 PM11/2/11
to fox-d...@googlegroups.com, DeJong, Wibe A
On Wed, Nov 2, 2011 at 5:10 PM, Andrew Walker <andrew...@bristol.ac.uk> wrote:


No way of fixing this in FoX, but Toby White wrote a python script (distributed as part of ccVis) to sort out exactly this problem. It's attached. It turns out you need an XML parser with quite decent error reporting, an ability to rewind over the last handful of characters of the document after an error occurred and the patience to list all the ways that an XML document could be truncated.

We just used to use this at the start of any XML processing pipeline.

Excellent - I guessed this was one solution and since it is already written, many thanks
 
Cheers,

Andrew








--

Andrew Walker  <andrew...@bris.ac.uk>
http://www1.gly.bris.ac.uk/~walker/

Department of Earth Sciences,
University of Bristol,
Wills Memorial Building,
Queen’s Road,
Bristol, BS8 1RJ, UK






--
You received this message because you are subscribed to the Google Groups "FoX-discuss" group.
To post to this group, send email to fox-d...@googlegroups.com.
To unsubscribe from this group, send email to fox-discuss...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/fox-discuss?hl=en.


Andrew Walker

unread,
Nov 7, 2011, 3:19:45 PM11/7/11
to fox-d...@googlegroups.com, DeJong, Wibe A
Hi all,

I've now got a FoX version running with the basic support for bonding in
WCML that Peter asked for. The code is not yet merged and as yet has no
documentation (this is the first draft) but it can be found (and
downloaded) at:

https://github.com/andreww/fox/tree/cmlbonds

Feedback welcome, particularly with respect to the interface and XML
output (see links below). These change adds several new optional arguments
to all specific versions of the generic cmlAddMolecule subroutine (thanks
m4):

* bondAtom1Refs string array of atom refs representing the atoms at one
"end" of the bonds.

* bondAtom2Refs string array of atom refs representing the atoms at the
other "end" of the bonds.

* bondOrders string array of bond order symbols. As far as FoX cares you
can put anything in here but 'S', 'D' and 'T' are the usual content.

* bondIds string array of Ids for the bonds.

* nobondcheck logical scalar. Defaults to .false. - set to .true. to
disable run-time bond sanity checking. The tests (detailed below) may
become expensive for large macromolecules.

Although these arguments are all optional, WCML enforces the constraint
that bondAtom1Refs, bondAtom2Refs and bondOrders must all be present and
of the same size (causing bonds to be added to the molecule) or non are
present (no bonds are added). bondIds and nonbondcheck are really
optional. If bonds are added the atomIds optional argument must be
present, although its size is not constrained.

As long as nonbondcheck is not set to .true., WCML enforces several
constraints on the content of atomIds (the ids), bondAtom1Refs and
bondAtom2Refs (the refs). These are currently: every ref must be an id.
The ith ref in bondAtom1Refs must be different from the ith ref in
bondAtom2Refs. I'll add a check for bonds not being added twice in due
course. There is a test case to make sure I do (so, in the meantime you'll
see a new failure from the WCML test suite).

I've added a number of tests one of which is probably the best use case.
This Fortran:

https://github.com/andreww/fox/blob/cmlbonds/wcml/test/test_cmlAddMolecule_10.f90

produces this XML:

https://github.com/andreww/fox/blob/cmlbonds/wcml/test/test_cmlAddMolecule_10.xml

Peter, does this look OK?

Cheers,

Andrew

Peter Murray-Rust

unread,
Nov 7, 2011, 3:30:22 PM11/7/11
to fox-d...@googlegroups.com, DeJong, Wibe A
On Mon, Nov 7, 2011 at 8:19 PM, Andrew Walker <Andrew...@bristol.ac.uk> wrote:
Hi all,

I've now got a FoX version running with the basic support for bonding in
WCML that Peter asked for. The code is not yet merged and as yet has no
documentation (this is the first draft) but it can be found (and
downloaded) at:

   https://github.com/andreww/fox/tree/cmlbonds


 
I've added a number of tests one of which is probably the best use case.

Looks fantastic! and many thanks

P.
Reply all
Reply to author
Forward
0 new messages