NWChem parsing problem: zmatuser template

8 views

Skip to first unread message

Jorge Estrada

unread,

Nov 24, 2011, 7:53:16 AM11/24/11

to quixote-...@googlegroups.com

Pablo and I are trying to parse an NWChem output file, but the
conversion from LogXML to CML fails. The problem is for the following
part (the intermediate LogXML file that the parser generates from the
NWChem output file):

<module cmlx:lineCount="20" cmlx:templateRef="zmatuser">
<list cmlx:lineCount="1" cmlx:templateRef="stretch">
<array dataType="xsd:integer" dictRef="n:serial" size="1">4</array>
<array dataType="xsd:string" dictRef="n:constr" size="1"/>
<array dataType="xsd:string" dictRef="n:name" size="1">rO4C2</array>
</list> 5 Bend aO4C2N3 4 2 3 124.80139 0.00000 6 Torsion pO4C2N3H
4 2 3 1 180.00000 -0.00000 7 Stretch rH5N3 5 3 0.99328 0.00001 8 Bend
aH5N3C2 5 3 2 121.16237 -0.00000 9 Torsion dH5N3C2H 5 3 2 1 0.00000
-0.00000 10 Stretch rH6N3 6 3 0.99614 0.00000 11 Bend aH6N3C2 6 3 2
119.40277 -0.00000 12 Torsion pH6N3C2H 6 3 2 5 180.00000 -0.00000
</module>

The array element "n:constr" says it has size 1, but is actually empty
(this is because, in the NWChem file is a space character, and the
parsing to LogXML seems to strip it). The parser complains:
Caused by: java.lang.RuntimeException: Size attribute: 1 incompatible
with content: 0
at
org.xmlcml.cml.element.CMLArray.finishMakingElement(CMLArray.java:155)

As the snippet above shows, the key template is "zmatuser":
...
<record repeat="10">{X}</record>
<record id="stretch" repeat="*" makeArray="true"
>{I5,n:serial}{A1,n:constr}Stretch {A7,n:name}.*</record>
...

Can anybody (Peter?) see how to solve this parsing problem? We are
studying how the parsing works, but perhaps an expert in the NWChem
parsing system can easily provide a solution.

Thanks,

Jorge

Peter Murray-Rust

unread,

Nov 24, 2011, 8:24:18 AM11/24/11

to quixote-...@googlegroups.com

On Thu, Nov 24, 2011 at 12:53 PM, Jorge Estrada <jorge....@unizar.es> wrote:

Pablo and I are trying to parse an NWChem output file, but the conversion from LogXML to CML fails. The problem is for the following part (the intermediate LogXML file that the parser generates from the NWChem output file):

This is a perennial problem in legacy files - where whitespace is significant - either as explicit space(s) " " or as zero-length spaces.

<module cmlx:lineCount="20" cmlx:templateRef="zmatuser">
<list cmlx:lineCount="1" cmlx:templateRef="stretch">
<array dataType="xsd:integer" dictRef="n:serial" size="1">4</array>
<array dataType="xsd:string" dictRef="n:constr" size="1"/>
<array dataType="xsd:string" dictRef="n:name" size="1">rO4C2</array>
</list> 5 Bend aO4C2N3 4 2 3 124.80139 0.00000 6 Torsion pO4C2N3H 4 2 3 1 180.00000 -0.00000 7 Stretch rH5N3 5 3 0.99328 0.00001 8 Bend aH5N3C2 5 3 2 121.16237 -0.00000 9 Torsion dH5N3C2H 5 3 2 1 0.00000 -0.00000 10 Stretch rH6N3 6 3 0.99614 0.00000 11 Bend aH6N3C2 6 3 2 119.40277 -0.00000 12 Torsion pH6N3C2H 6 3 2 5 180.00000 -0.00000
</module>

The array element "n:constr" says it has size 1, but is actually empty (this is because, in the NWChem file is a space character, and the parsing to LogXML seems to strip it). The parser complains:
Caused by: java.lang.RuntimeException: Size attribute: 1 incompatible with content: 0
at org.xmlcml.cml.element.CMLArray.finishMakingElement(CMLArray.java:155)

the desired output is probably
<array dataType="xsd:string" dictRef="n:constr" size="1" delimiter="|">| |</array>
where the whitespace is bounded by explicit delimiters (rather than the default which is whitespace). I can't remember offhand how to specify the delimiter. Probably the thing to do is add a failing test to my branch so I have to fix it.

It also appears that the parse has failed as there is unparsed material afterwards.

As the snippet above shows, the key template is "zmatuser":
...
<record repeat="10">{X}</record>
<record id="stretch" repeat="*" makeArray="true"
>{I5,n:serial}{A1,n:constr}Stretch {A7,n:name}.*</record>
...

Can anybody (Peter?) see how to solve this parsing problem? We are studying how the parsing works, but perhaps an expert in the NWChem parsing system can easily provide a solution.

Well the expert is me! Hopefully you are becoming experts :-)

P.

Thanks,

Jorge

--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Reply all

Reply to author

Forward

0 new messages