UPDATE: Parsers and dictionaries (NWChem, Gaussian)

7 views
Skip to first unread message

Peter Murray-Rust

unread,
Apr 21, 2011, 5:39:05 AM4/21/11
to quixote-...@googlegroups.com, DeJong, Wibe A, Rzepa, Henry, Anna K Croft
We had an extremely good skypecon yesterday and we are clearly making good progress both technically and organisationally. The technical side is blocking on me but should be cleared over the weekend if I am able.

PARSERS:
I have written well over 50% of the parser templates required for both NWChem and Gaussian. Essentially they "work" but there are output features that are not supported and I am adding these as I encounter them. I think there are enough to extract all the basic properties.

The average parse for Gaussian is about 7 seconds, though large files take longer and may be quadratic. The longest NWChem that completed was ca 45 seconds for a 20 MByte file. There are some NWChem parses that didn't complete - I am not sure whether they are in an infinite loop or simply taking a long time - I will know soon.

I have parsed all of Anna's Gaussian files (ca 1030) except for about 3 20-Mbyte ones. The average is about 8 files/minute on my laptop. I am currently doing Henry's files (I have downloaded 1500 and I think there are about 5000 - hope to download them before I go off air. I am skipping files over 20 Mbytes - the average time for the rest is similar to Anna - 80 files in 10 mins. Given that one calculation took 22 days a 1-minute parse is not the end of the world. However it will be valuable to have good performance for Avogadro and I think we may have a pre-parse to inspect unknown files. Also because JUMBOParser is declarative we can have self-modifying code - the parser can edit itself in response to the input it finds (there is no point in having planewave templates for a non -planewave calculation)

A number of Henry's calculations have terminated with error conditions. One of the great advantages of JUMBOParser is that it can reveal these messages even when it doesn't know their content. So it can (possibly) modestly learn error messages.

Lensfield appears to be working well (I have used it with NWChem and Gaussian) and in 3 different directories without problems. Thanks Sam

DICTIONARIES

The parsers generate new dictiionry entries and these will have to be annotated by humans. Before that I want to link the dictionaries to example of their use which will help people. Many items in Gaussian are likely to be of little interest being mainly debug-like output of fortran-like variable values. The combination of NWChem, Gaussian and Jaguar will give a good example of what the most fundamental terms are.

More later. I will check this in now.

P.



--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Reply all
Reply to author
Forward
0 new messages