Hello Phil
I am interested by the observation that variable names are used by scientists as identifiers, instead of controlled vocabulary identifiers. I can see how convenient shorthand gets adopted within communities, and used extensively in communication and code. However, I question how generic this is across communities.
My concern comes in part from the Iris processes such as merge, which inspect metadata and evaluate metadata equivalence. The current NetCDF loader does not preserve variable names, it treats them as format reference labels with no semantics attached. If this were changed then two CF data variables which currently merge may well not merge as variable names may not be consistent.
I recognise that many communities use consistent output variable names, e.g. enforced by model source code, where the variable names are controlled within their working scope, but I am not sure this scales to the general case of CF NetCDF datasets.
It seems to me that the functionality you are suggesting is to enable convenient access and is based on the assumption that variable names provide unique identifiers for datasets. The uniqueness criteria is enforced in a single NetCDF file but is not stable across multiple files.
I am tempted to consider this as specific functionality, useful to specific communities but tricky to generalise.
My approach to delivering this in my own code would be:
- to add a 'call back' function to my loader to include the NetCDF variable name as metadata on each loaded cube;
- then retrieve this element from each cube in the cube list returned from the load process, which includes the merge process;
- then check for uniqueness of the labels in my cube list;
- and finally convert my list to an ordered dictionary, using the unique labels as keys
I think this is a small amount of code which is better kept separate from the generic functionality which loads and merges data from CF NetCDF files based on CF metadata.