Wow, there are a lot more people in the discussion included. :) I think they belong to the Quixote project?
Ok, I think I was confused that I found in the compchemDict.cml this list of basis set names, while you are referring to the parameters that describe a force field.
We are converting the PARM files (basically 1,2,3, 4 atom parameters) to CML. We are not converting the functional form (that will be a lookup)
The confusion arose from the fact that I think of basis sets as an important part of the model chemistry of QM and of force fields as the comparable element in MD.
When I saw how you build the force field with all its parameters into the CML while the basis sets are just covered with names (in the version of CML I know up to now) it felt like a mismatch.
But as I can see here you just used this dictionary form for basis sets as a start or alternative and will in further steps include the parameters from the EMSL into the CML.
Did you also do it for basis sets?
No we intend to use EMSL
I think for a start I would introduce the key words for the force fields into our MD dictionaries, as to my opinion it fits quite well into the current development of the compchemDict.cml. Though I think it would be best to develop different dictionaries for the 3 domains.
I hope and I think that for the next three month the option of using force fields with a keyword from a dictionary can coexist with your solution of using fully description of the force field.
I was referring to the compchem.cml dictionary that I could download from the xml-cml.org page after instruction by Joe. There is for example a list of basis sets names that can be used to describe the basis set instead of including the description of the different factors in the basis set.Which QM dictionaries?
The mdout is very early - 2 hours old - and it's not yet committed. Possibly tonight. It will be in https://bitbucket.org/wwmm/jumbo-converters/src/78a115ee8f64/jumbo-converters-compchem/src/test/resources/compchem/amber/mdout
I think Quixote will refactor this and perhaps provide normalization for EMSL lookup
Maybe I could start off with a dictionary of the design we discussed in our project, adapt it to your dictionary schema and use it. I would use this as a solution in the next 3 month as I get the feeling you plan to fully refactor this part of CML to capture the aformentioned properties of an MD simulation. I can imagine it will last longer than the next few weeks and we have to design and create something working to the end of May.
Yes - this is very much part of our goal. There are several components to the input:
* coordinates
* parameterisation (e.g. what force fields, methods, basis sets)
* constraints (pressure, dielectric, etc.)
* control (what operations to carry out)
* machine and job dependent quantities (memory, cutoffs, etc)
We have a lot of the high-level design but are still working out the technical implementation
If you are interested I would send the dictionary/ies to you, when they are ready or at least on a good way to it, and keep you informed about our achievements. I would stay in the fashion of the compchemDict.cml although I have the impression you are planning a new or extended approach.
Is there a way to keep track of your latest developments? A list where you announce major or even minor changes?
Kind regards,
I will then use an xslt script to generate an input file for a specific application.
The idea is to use CML as general description that can then be converted to the specific input files for a computational chemistry application.
Exactly. And we'd be delighted for you to take a lead in this!
Sebastian
P.
I hope the paragraph above could give you an idea of the workflow we want to approach with CML.
I thought the xslt parser would decide if they can translate a part of cml to a specific code.
I don't understand :-)
We expect the results of this to go onto the Quixote wiki fairly soon.
P.
Cheers,
Sebastian
I could not find any ff or vdw neither in the dictionaries Joe pointed me to nor in the xsd. Is there a dictionary that I could also extend by certain parameters like a force field name in the manner as the basis sets are defined in compchemDict.cml?
We are at an early stage in the development of CML for force fields and we'd love to have your input.
Or that I could extend by certain water models or coupling algorithms for the pressure and temperature, respectively?
If the models have unique identifiers these can go in the dictionary (e.g. TIP3 is a reasonable dictionary term).
Generally CML deals with:
* objects - very well
* relationships - medium
* processes - not very well
I think we could give you a lot of input and maybe we could also write and extend some dictionaries that can be integrated in your language stack.
It sounds like we are starting to get a nucleus of interested people - and that when this starts to take off . I have copied in some of the Quixotans.
Current state:
* have complete parser for AMBER.parm (94 and 99) input
* am looking at AMBER output log files (not trajectories).
Kind regards,
Sebastian
Am 03.02.2011 17:36, schrieb Peter Murray-Rust:
On Thu, Feb 3, 2011 at 4:13 PM, Sebastian Breuers <breu...@uni-koeln.de> wrote:
Am 03.02.2011 16:59, schrieb Peter Murray-Rust:
On Thu, Feb 3, 2011 at 3:48 PM, Sebastian Breuers <breu...@uni-koeln.de> wrote:
Hello, Peter,
if we use force fields in the description of a molecular dynamic job would the description of the parameters be necessary. I think it could be helpful if you want to define a new force field and for that you also have to define the functions that should be used to which the values of the atom combinations should be applied. But mostly the programs know how to deal with atoms having a certain atom type and a certain topology.
The particular *programs* may know, but we need things that transfer between programs and here the semantics have to independent of the program. IOW we can print them in a paper.
My idea was more that the xslt, i.e. the parser, should do the translation between the CML force field name entry and the related entry one should do for the specific program.
You mean that the dictionary contained the force field info and this was translated by the stylesheet. That may be possible. We are discussing dictionaries at the moment.
I am doing a simple example of AMBER input - that will make it clearer what we need to talk about.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-- _____________________________________________________________________________ Sebastian Breuers Tel: +49-221-470-4108 EMail: breu...@uni-koeln.de Universität zu Köln University of Cologne Department für Chemie Department of Chemistry Organische Chemie Organic Chemistry Greinstraße 6 Greinstraße 6 Raum 325 Room 325 D-50939 Köln D-50939 Cologne, Federal Rep. of Germany _____________________________________________________________________________
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-- _____________________________________________________________________________ Sebastian Breuers Tel: +49-221-470-4108 EMail: breu...@uni-koeln.de Universität zu Köln University of Cologne Department für Chemie Department of Chemistry Organische Chemie Organic Chemistry Greinstraße 6 Greinstraße 6 Raum 325 Room 325 D-50939 Köln D-50939 Cologne, Federal Rep. of Germany _____________________________________________________________________________
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-- _____________________________________________________________________________ Sebastian Breuers Tel: +49-221-470-4108 EMail: breu...@uni-koeln.de Universität zu Köln University of Cologne Department für Chemie Department of Chemistry Organische Chemie Organic Chemistry Greinstraße 6 Greinstraße 6 Raum 325 Room 325 D-50939 Köln D-50939 Cologne, Federal Rep. of Germany _____________________________________________________________________________
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-- _____________________________________________________________________________ Sebastian Breuers Tel: +49-221-470-4108 EMail: breu...@uni-koeln.de Universität zu Köln University of Cologne Department für Chemie Department of Chemistry Organische Chemie Organic Chemistry Greinstraße 6 Greinstraße 6 Raum 325 Room 325 D-50939 Köln D-50939 Cologne, Federal Rep. of Germany _____________________________________________________________________________
The dictionaries are being refactored this weekend. Joe and Sam have given us templates and I will probably go through the existing dictionaries and make them conform and also refactor the content.
So is there a possibility to get access to these refactored versions also? Do I have to checkout CMLlite?
For force fields there are no authorities providing values so it's important to show the explicit values.
Ok, I see. I can understand that we then would have problems to address particular force fields by just a force field name.
I'm very interested to know how to address the force fields in your parameterized way.
3? domains? I have QM and FF/MD so I would have 2 domains. There would also be a general dictionary which was independent of method
We decided to also introduce and cover Docking as an additional domain for molecular simulation.
IF the keyword is accepted by the community then that is probably OK. Unfortunately different versions of the same program and certainly mutants of the program change the force field values but not the name. If FFs had version numbers I would be relaxed. But saying "MM2" or "AMBER94" is a fragile way of defining a set of values.
AMBER reads the values from the file so there has to be some way of translating "AMBER-PARM94" to a set of values. How do we do this so everyone has the same data?
I was not aware that there are differences in a force field named by the same name.
Are you storing this to an XML/CML dialect converted force field parameters into a separate database file? As I got you right your way is to come from the output of a simulation, that was run with a certain force field and some definite parameters. From that you generate a CML file that holds also the information about the parameters of the force field and the results.
Let's assume you come from somewhen before the simulation. Then I specify in some way a force field. To keep it easy I would have started with a force field keyword and maybe also with the program and version, since I now learned that there are differences in the force fields in different versions. Then to compose the CML (to fully describe my job setup) I would do a lookup to find the parameters for my force field.
How would I do this with the parameterized force fields in CML?
Maybe I could start off with a dictionary of the design we discussed in our project, adapt it to your dictionary schema and use it. I would use this as a solution in the next 3 month as I get the feeling you plan to fully refactor this part of CML to capture the aformentioned properties of an MD simulation. I can imagine it will last longer than the next few weeks and we have to design and create something working to the end of May.
"we" includes "everyone". All our discussions are public and we share our output. If you have an idea it's up to the community to decide what happens. The only real criterion is that it has to be implementable - we do not build vapourware. Generally the person suggesting something ends up implementing it!
I understand that we have to lead this discussion in public. It is also not my intention to create vapourware. I will do my best to generate a reasonable proposal.
Quixote is a good place and also we are revitalising the CML Blog (Joe?)
You mean to edit the wiki pages on the quixote project?
So is there a possibility to get access to these refactored versions also? Do I have to checkout CMLlite?
I am literally hacking the first dictionary as I speak. (ca 200 terms). The checking is quite strict so it takes a little time.
No you don't have to check out anything - the dictionary is in XML, and there is a stylesheet
The question was more about where I can get it. To me this was quite abstract as I could not get a glimpse at your work and I only have the compChemDict.cml at hand.
Ok, I see. I can understand that we then would have problems to address particular force fields by just a force field name.
I'm very interested to know how to address the force fields in your parameterized way.
You can see the Amber one at https://bitbucket.org/wwmm/jumbo-converters/src/733e85b50c20/jumbo-converters-compchem/src/test/resources/compchem/amber/in/ref/parm94.xml . this is the raw parse but it uses the tersm in the AM<BER documentation. I will be talking with Joe and Sam about how to make it more CML-like. But basically it defines atom types and the 1,2,3 and 4-tuples properties
Got it, saw it and understood that you are converting for different programs the force field configuration files into two files. A program specific dictionary file
and a program specific CML database file that refers to entries in the specific dictionary file.
And this combination can than be used in a structure description that in turn refers to the CML database file that contains the force field parameters.
We decided to also introduce and cover Docking as an additional domain for molecular simulation.
OK - I shall not be doing this as high priority (it's quite complex and it relies on the other components being solved)
We will try to deal with that issue.
Let's assume you come from somewhen before the simulation. Then I specify in some way a force field. To keep it easy I would have started with a force field keyword and maybe also with the program and version, since I now learned that there are differences in the force fields in different versions. Then to compose the CML (to fully describe my job setup) I would do a lookup to find the parameters for my force field.If the lookup is robust, that's true, but it doesn't normally work that way. To give a QM example, I'm told that "B3LYP" in Gaussian is different from B3LYP in other programs.
How would I do this with the parameterized force fields in CML?
How would you look up? Someone has to take repsonsibility. In basis sets it seems to be PNL. For force-fields there is no-one. So perhaps we shall need to do something in Quixote or Blue Obelisk. (Note that Quixote's current priority is QM, not FF)
I think I did not express myself clear enough. I really want to use CML.
This is why I described my setup above. As I am the one who is responsible for the development and usage this was the pratical question of 'How to do it?'. I think I got my answer with your example of the parm file, if the summary I wrote of your current work writing the amber dictionary is correct (cf. answer referring to your link).
Still got the question how I include a dictionary into my CML? Meaning the real, actual xml string to refer to the dictionary itself. So that a 'dictRef' in my document would point to a real entry in a real file.
A further one is if the CML dictionaries Joe pointed me to (by directly downloading them from the xml-cml.org) are all that are available (compchemDict.cml, molecule.cml, property.cml, propertyG03.cml, unitTypeDict.cml)?
You are currently refactoring the CML dictionaries. Is there a way to get access to them? To those changed one, not the one I already downloaded.
If I would come up with some suggestions and maybe requirements to the CML or the dictionaries it would be reasonable to base these questions on top of those refactored dictionaries and not outdated ones (Just to comment and clarify the aim of my questions.).
If it is not yet possible to get access to those refactored dictionaries can you estimate till when they are presentable to the public. On the basis of that we (MoSGrid) can decide if we can afford to wait or follow the old standards and the information basis I received and gathered up to now.
[Quixote colleagues - it may be useful to abstract some of this discussion for the wiki...]
On Tue, Feb 15, 2011 at 3:44 PM, Sebastian Breuers <breu...@uni-koeln.de> wrote:
The question was more about where I can get it. To me this was quite abstract as I could not get a glimpse at your work and I only have the compChemDict.cml at hand.
So is there a possibility to get access to these refactored versions also? Do I have to checkout CMLlite?
I am literally hacking the first dictionary as I speak. (ca 200 terms). The checking is quite strict so it takes a little time.
No you don't have to check out anything - the dictionary is in XML, and there is a stylesheet
I am doing them now :-).
The problem is that we have been building up experience over the years and although the dictionaries have been "valid CML" they haven't been consistent in their usage of the components. This is what we are now calling "conventions". There is a dictionary convention:
http://www.xml-cml.org/convention/dictionary
which specifies what is allowed and what is not allowed in a dictionary. I'm having to refactor dictionary by dictionary. There are about 20 and they are in different stages of completeness.
I will make the CASTEP dictionary available, through Joe, hopefully this afternoon.
I will be working on the other dictionaries as I fly to Italy tomorrow. It's a good activity for airports as a lot of the stuff is tedious manual hack. Once it's done then it's trivial to convert to other formats.
Got it, saw it and understood that you are converting for different programs the force field configuration files into two files. A program specific dictionary file
Yes.
and a program specific CML database file that refers to entries in the specific dictionary file.Yes. It needn't be in a database and most are probably littered around the Internet. We'd like to get them all collected in the Blue Obelisk. They aren't big.
And this combination can than be used in a structure description that in turn refers to the CML database file that contains the force field parameters.
Yes.
Dictionary + force-field + structure ==> program input
Ok, don't want to raise again a discussion ;) but I more thought the following
This is why I described my setup above. As I am the one who is responsible for the development and usage this was the pratical question of 'How to do it?'. I think I got my answer with your example of the parm file, if the summary I wrote of your current work writing the amber dictionary is correct (cf. answer referring to your link).
Yes. We have tended to do these things on an ad hoc basis - when there is a need we work on the specific subdomain. Since I have a colleagues working on Amber I thought I would see if I could solve that.
Still got the question how I include a dictionary into my CML? Meaning the real, actual xml string to refer to the dictionary itself. So that a 'dictRef' in my document would point to a real entry in a real file.
The idea is that
<cml xmlns:amber="http://www.xml-cml.org/dictionary/amber/" ... >
will be both an identifier AND an address. So the name defines the dictionary and the name also acts an address. This is TimBL's great idea of confalting names and addresses. It works when you have control over the server, fails when you migrate.
<property dictRef="amber:parm94" .../>
will point to a dictionary entry
http://www.xml-cml.org/dictionary/amber#parm94 ... >
A further one is if the CML dictionaries Joe pointed me to (by directly downloading them from the xml-cml.org) are all that are available (compchemDict.cml, molecule.cml, property.cml, propertyG03.cml, unitTypeDict.cml)?
You are currently refactoring the CML dictionaries. Is there a way to get access to them? To those changed one, not the one I already downloaded.
They will be posted on the xml-cml site very soon. Just the CASTEP one to start with. Expect the others over a few days.
If I would come up with some suggestions and maybe requirements to the CML or the dictionaries it would be reasonable to base these questions on top of those refactored dictionaries and not outdated ones (Just to comment and clarify the aim of my questions.).
Yes. Assume we are starting from scratch on most dictionaries
If it is not yet possible to get access to those refactored dictionaries can you estimate till when they are presentable to the public. On the basis of that we (MoSGrid) can decide if we can afford to wait or follow the old standards and the information basis I received and gathered up to now.
Anything you have gather on dictionaries will need to be refactored. Think of this as releasing CML Dictionary V0.1 in a week or two
This is all very exciting. Did you meet Christoph Steinbeck when he was at the Biozentrum in Koeln? He is now at EBI near Cambridge.
P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Am 15.02.2011 18:23, schrieb Peter Murray-Rust:Ok, don't want to raise again a discussion ;) but I more thought the followingDictionary + force-field + structure ==> program input
Program specific force field dictionary + force field parameters in a CML file (that was what I meant with database file) + structure ==> coordinate and topology input
+ MD dictionary | converter (for specific MD program) ==> input for specific MD program
This scheme can be adapted to QM and DC.
Ok. I will post then the ideas we collected for MD and create the proposal (the initial dictionary file based on that ideas).Yes. Assume we are starting from scratch on most dictionaries
Yes. I had a pratical course in bioinformatics at the group of Prof. Schomburg 4 or 5 years ago. Christoph lead a subgroup there. I had the occasion to develop a little bit at the JChemPaint project. I was there when they started to refactor the JChemPaint application. At least I am still member of the sourceforge JChemPaint and CDK projects. :D
P.
On Tue, Feb 15, 2011 at 5:53 PM, Sebastian Breuers <breu...@uni-koeln.de> wrote:
Am 15.02.2011 18:23, schrieb Peter Murray-Rust:
Ok, don't want to raise again a discussion ;) but I more thought the followingDictionary + force-field + structure ==> program input
Program specific force field dictionary + force field parameters in a CML file (that was what I meant with database file) + structure ==> coordinate and topology input
Yes - I was just being rather fuzzy
+ MD dictionary | converter (for specific MD program) ==> input for specific MD program
This scheme can be adapted to QM and DC.
Yes
There may be other inputs such as machine parameters, memory, timings, etc.
Ok. I will post then the ideas we collected for MD and create the proposal (the initial dictionary file based on that ideas).Yes. Assume we are starting from scratch on most dictionaries
Good. I think you'll find that in a day or so that you'll be fully in touch
Yes. I had a pratical course in bioinformatics at the group of Prof. Schomburg 4 or 5 years ago. Christoph lead a subgroup there. I had the occasion to develop a little bit at the JChemPaint project. I was there when they started to refactor the JChemPaint application. At least I am still member of the sourceforge JChemPaint and CDK projects. :D
P.
Great I am meeting Christoph for a beer and will give greetings
Hello,
I've got a new question concerning the definition of force fields:
As I understood up to now the configuration files of specific programs will be converted in the program specific dictionary and the database file referencing to the dictionary and containing the parameters of the force field.
I was wondering if there is something like a superclass dictionary that contains terminologies that are common to force fields in general, e.g. force constants or harmonic potentials, dihedral specifications or electrostatic parameters of certain atom types?