New YAML-based input file for Cantera

Ray Speth

unread,

Mar 12, 2019, 11:43:37 PM3/12/19

to Cantera Users' Group

Hi all,

In an effort to resolve some of the shortcomings of Cantera's current input file options and enable a range of new capabilities, I have been working on a new YAML-based input file specification for Cantera.

To get a feel for the new YAML format, you can take a look at the API documentation for the format or several example input files: gri30.yaml, ptcombust.yaml, sofc.yaml, thermo-models.yaml, or RMG_PAH.yaml. You can also try running the ck2yaml and cti2yaml scripts on any mechanisms that you are currently using to see what they will look like in the new format (an xml2yaml script is still to come).

This format is intended to be more flexible and extensible than the current formats. One important concept is that additional fields in the input file that are not used directly by Cantera will be still accessible through Cantera, e.g. via Species and Reaction objects. Another major feature which will be enabled by the YAML format is serialization of many Cantera objects, which will provide many new opportunities for interactions with other software.

At this time, I am looking for feedback from Cantera users regarding the new format, to make sure that it meets the needs of the community both now and as Cantera continues to evolve. Some high-level questions that I would appreciate consideration of are the following:

Are there problems or limitations with the existing XML and CTI formats that are not addressed by the YAML format?
Are you using XML and CTI format mechanisms with other applications (as either input or output) in a way that would make it difficult to transition to a YAML-based mechanism format?
Is the structure of the YAML format flexible enough to support any extensions you might consider, either as part of Cantera or another tool built on top of Cantera?

The work-in-progress implementation of the YAML format is currently open as a pull request on Github. I have posed some more detail-oriented questions there. Responses are welcome either in this thread or on Github.

Regards,

Ray

ischg

unread,

Mar 28, 2019, 11:19:58 AM3/28/19

to Cantera Users' Group

Dear Ray,

Thanks for your continued work on this. I looked at the pull request and wanted to provide feedback to your three questions:

1. Existing XML/CTI: I never considered them as something I felt the urge to edit manually, especially as mechanisms can be flexibly built using the python interface (see https://cantera.org/examples/python/kinetics/extract_submechanism.py.html example). From what I can tell, the YAML input allows for simplified extraction/merging, which is useful, but my intuition would be to construct a mechanism/Solution object within python, i.e.

ct.Solution(thermo='IdealGas', kinetics='GasKinetics', species=species_list, reactions=reaction_list)

and then output the result to CTI/XML/YAML. Unfortunately, this strategy was not intuitive using the existing code (I may have overlooked something?), so I resorted to building things on the fly without ever saving. That said, an export routine that allows saving Solution objects to YAML will be extremely useful, as it would eliminate error-prone manual input / copy paste. There may be some additional manipulations necessary to generate consistent lists (e.g. matching species/element names), but that's a relatively simple task.

2. No (no limitations)

3. From my perspective, I see additional advantages of the 'human-readable' YAML format for higher level where repetitive tasks, which can be easily scripted. Whether this is considered useful to the community is obviously very subjective. I can provide more detail (plus code I wrote for that) if there is interest.

-ingmar-

Two comments:

* in lines like {T: 900.0, P: 5 atm, Y: {O2: 0.4, N2: 0.4, AR: 0.2}}, pint (https://pint.readthedocs.io/en/latest/) could be used to allow for alternative units, e.g. {T: 900.0 kelvin, P: 5 bar, Y: {O2: 0.4, N2: 0.4, AR: 0.2}}

* I noticed that PyYAML is somewhat buggy (e.g. won't parse numbers 3.2e5 as expected: yaml.load('a: 3.2e5') returns a string (both version 3.12 and 5.1 with the additioanl Loader option), whereas json.loads('{"a":3.2e5}') is more specific and returns the expected output). There are workarounds (https://stackoverflow.com/questions/30458977/yaml-loads-5e-6-as-string-and-not-a-number), but I see this as a potential source for significant frustration ...

Wolfgang Bessler

unread,

Apr 1, 2019, 5:08:43 AM4/1/19

to Cantera Users' Group

Dear Ray,

thank you for working on this. I fully agree that it is a good idea to consolidate the cti and xml input files. After trying the cti2yaml converter on a PEM fuel cell cti file, I realized the following issue (full input files attached): The Reaction ID is not transferred to YAML.

cti Input:

ideal_interface(
    name = "Platinum_surface_anode",
    phases = 'gas_anode Platinum Nafion',
    elements = "Pt H O",
    species = "(PtAn) H(PtAn) H2O(PtAn)",
    site_density = (2.50e-9,'mol/cm2'),
    reactions = ["PtAn-*"]
    )
surface_reaction    ("H2 + (PtAn) + (PtAn) <=> H(PtAn) + H(PtAn)",                        stick(0.046, 0,(0,'kJ/mol')),                     id = 'PtAn-rxn01')    # Tafel
surface_reaction    ("H2O + (PtAn)     <=> H2O(PtAn)",                                        stick(1.0, 0,(0,'kJ/mol')),                         id = 'PtAn-rxn02')
edge_reaction        ("H(PtAn)            <=>    H+[Nafion] + (PtAn) + electron",            [1e38, 0.0, (200.0, 'kJ/mol')],     beta = 0.5, id = 'PtAn-ctr02')    # Volmer
edge_reaction        ("H2 + (PtAn)        <=>    H(PtAn) + H+[Nafion] + electron",        [1e48, 0, (200.0, 'kJ/mol')],        beta = 0.5, id = 'PtAn-ctr01')    # Heyrovski

yaml output:

phases:
- name: Platinum_surface_anode
thermo: surface
elements: [Pt, H, O]
species: [(PtAn), H(PtAn), H2O(PtAn)]
kinetics: surface
reactions: [Platinum_surface_anode-reactions]
site-density: 2.5e-09 mol/cm^2

Platinum_surface_anode-reactions:
- equation: H2 + (PtAn) + (PtAn) <=> H(PtAn) + H(PtAn) # Reaction 1
sticking-coefficient: {A: 0.046, b: 0, Ea: 0.0 kJ/mol}
- equation: "H2O + (PtAn) \t <=> H2O(PtAn)" # Reaction 2
sticking-coefficient: {A: 1.0, b: 0, Ea: 0.0 kJ/mol}
- equation: "H(PtAn)\t\t\t<=>\tH+[Nafion] + (PtAn) + electron" # Reaction 3
rate-constant: {A: 1.0e+38, b: 0.0, Ea: 200.0 kJ/mol}
beta: 0.5
- equation: "H2 + (PtAn)\t\t<=>\tH(PtAn) + H+[Nafion] + electron" # Reaction 4
rate-constant: {A: 1.0e+48, b: 0, Ea: 200.0 kJ/mol}
beta: 0.5

Thus, the individual reaction IDs are lost. Yet, we are using them from our C++ code for accessing specific reactions (via "interfaceKinetics->reaction(nr)->id"). We use this, for example, for sensitivity analyses. Thus I would suggest to keep a possibility for specifying an ID string.

Another minor comment: I like to structure my cti files in a way that each phase definition is followed by the needed species and reaction definitions for that phase, before the next phase is defined. The cti2yaml converter rearranges that structure (first all phases, then all species, then all reaction). Can the user re-order this file again?

Best regards,

Wolfgang

PEFC_elementarykinetics.cti

PEFC_elementarykinetics.yaml

Ray Speth

unread,

Apr 10, 2019, 10:55:39 AM4/10/19

to Cantera Users' Group

Ingmar,

Thanks for taking a look and providing this feedback. Unit conversion functions are already in place. Pint is a nice library, but of course can't be used in C++. The warning on PyYAML is helpful, too. Currently, all YAML parsing in Cantera is done in C++, using the yaml-cpp library. For generating YAML files, we are using the ruamel_yaml instead of PyYAML, since it provides more control over the formatting of the output.

Regards,

Ray

Ray Speth

unread,

Apr 10, 2019, 1:10:33 PM4/10/19

to Cantera Users' Group

Wolfgang,

Thanks for your comments on this. I didn't know that anyone was using the "id" field of reactions for anything other than the ability to specify which reactions belonged to which phase entries in a file. I think it should be easy enough to continue supporting this feature for the purpose you're using it as well.

I did notice this pattern of putting the species and reactions for a particular phase immediately after the definition of the phase, before additional phase definitions in a few input files. However, I'm not quite sure how to easily support this style in the new format. It would require changing the structure to not require all phase definitions to be in the section named 'phases:', and a way of specifying where to read the phase definition from in that case. I'll have to give that some thought.

Regards,

Ray

Torsten Methling

unread,

May 13, 2019, 1:55:46 PM5/13/19

to Cantera Users' Group

Hi Ray,

Thanks to you and the others for the great work on the YAML format and on Cantera in general. We are working on optimisation and for that we had to create our own mechanism format to add a few features. So I just wanted to share some ideas on extensions we considered and that were quite useful for us:

Flexible uncertainties: e.g. instead of adding uncertainties to A, b, E... etc. directly, we rather assigned uncertainties to k at different temperatures (similar to the work of Tamas Turanyi's group). These do not even necessarily have to have the same uncertainties for the lower and upper bound (e.g. upper bound could differ when physical limits of the collision frequency are reached for certain rate coefficients). But I guess this would already work with the current format, if users were allowed to add user defined input in the YAML format to e.g. reactions, that would just be ignored by the Cantera parser.
Global variables / functionality between different variables: Sometimes there can be parameters in chemical kinetic models, that are required for more than one reaction. Examples would be branching ratios for a reaction A+B<=>Products or prompt HCO dissociation as proposed by Labbe et al. For example, HCO prompt dissociation requires reactions involving HCO to be multiplied with the same set of Arrhenius parameters for every reaction. To solve this, we defined a global parameter for this Arrhenius set and added the functionality to multiply reaction rates with these global parameters. With this approach we can simply change e.g. the prompt dissociation parameters at one spot of the mechanism format and we don't have extra redundancies within the mechanism.

So this is helpful for us. The disadvantage is, that the mechanisms / reactions become more bloated and these functionalities might make the entries more complex and harder to read for the user. So this is just an idea and I am not sure if that is really suitable for the philosophy behind the YAML mechanism format.

Best regards,

Torsten

Ray Speth

unread,

Jun 26, 2019, 10:33:11 AM6/26/19

to Cantera Users' Group

Hi Wolfgang,

I just wanted to follow up on this, now that the YAML branch has been merged in.

The "id" field is now preserved in CTI to YAML conversions, and accessible from code in the same ways as before. The only change is that we no longer assign numerical IDs to reactions that don't have a user-specified ID.

I had a discussion with the other developers about the placement of the phase definitions within the input file, and our conclusion was that any of the ways we could think of to allow organizing the input file in this way would make the input files harder to parse. This would especially affect third party tools (which we expect more of with the advent of the YAML format) since there wouldn't be a clear way to identify a section as containing phase definitions as opposed to other content. That said, you can have multiple alternating species and reaction sections, it's just the phase definitions which have to all be in one section.

Regards,

Ray

Aditya Savara

unread,

Dec 3, 2019, 4:12:17 PM12/3/19

to Cantera Users' Group

Hi Ray,

I like Torsten's suggestion of allowing us to add in arbitrary fields that are ignored (whether it's in cti or yaml format). Then we could add uncertainties etc. https://mail.google.com/mail/u/0/?pli=1#inbox/FMfcgxwGBwcjvRZLvqpDJnbbDWbmRlSL

Perhaps you can have something at the top of the YAML specification called "customTags" or something like that which the user can fill. Then, your parser will ignore anything that is from the custom tags list if it does not understand it. That way if somebody misspells a tag the parser can give a warning or error, while correctly ignoring custom tags (like if we make up an uncertainty tag or something like that). I suspect that ignoring custom tags is easier to do in YAML.

Aditya

Reply all

Reply to author

Forward