Selection of cubes by netcdf variable name.

592 views
Skip to first unread message

pbentley

unread,
Nov 5, 2012, 9:11:17 AM11/5/12
to scitoo...@googlegroups.com
Hi,

When loading data from netCDF files it would be extremely useful to be able to select cubes by specifying their original variable name (which typically is quite short), and not just by their CF standard name (which is often very long and therefore cumbersome to enter).

By way of context, a number of recent climate model intercomparison projects (e.g. CMIP5) have adopted the practice of assigning a community-agreed short name to each of the geophysical parameters which will be produced. These short names (a.k.a. MIP names) are then used to name the corresponding variables in netCDF files. For example, the variable used to store surface air temperature is usually called 'tas'. This makes it much more convenient to code something like:

var = cubes['ccb']

(assuming the variable is so-named), as opposed to the standard name equivalent:

var = cubes['air_pressure_at_convective_cloud_base']

A possible solution (at least for netCDF-based datasets) would be to add 'short_name' as a new cube attribute. This attribute would take the value of the like-named metadata attribute if it was present in the input netCDF file, otherwise it would take the value of the variable itself.

Our experience - at least on the aforementioned climate modelling projects - is that most scientists favour use of the short variable names in day-to-day conversation rather than CF standard names, which tend to be too unwieldy..

Regards,
Phil

marqh

unread,
Nov 6, 2012, 6:04:02 AM11/6/12
to scitoo...@googlegroups.com
Hello Phil

I am interested by the observation that variable names are used by scientists as identifiers, instead of controlled vocabulary identifiers.  I can see how convenient shorthand gets adopted within communities, and used extensively in communication and code.  However, I question how generic this is across communities. 

My concern comes in part from the Iris processes such as merge, which inspect metadata and evaluate metadata equivalence.  The current NetCDF loader does not preserve variable names, it treats them as format reference labels with no semantics attached.  If this were changed then two CF data variables which currently merge may well not merge as variable names may not be consistent.

I recognise that many communities use consistent output variable names, e.g. enforced by model source code, where the variable names are controlled within their working scope, but I am not sure this scales to the general case of CF NetCDF datasets.

It seems to me that the functionality you are suggesting is to enable convenient access and is based on the assumption that variable names provide unique identifiers for datasets.  The uniqueness criteria is enforced in a single NetCDF file but is not stable across multiple files.

I am tempted to consider this as specific functionality, useful to specific communities but tricky to generalise.

My approach to delivering this in my own code would be:
  • to add a 'call back' function to my loader to include the NetCDF variable name as metadata on each loaded cube;
  • then retrieve this element from each cube in the cube list returned from the load process, which includes the merge process;
  • then check for uniqueness of the labels in my cube list;
  • and finally convert my list to an ordered dictionary, using the unique labels as keys

I think this is a small amount of code which is better kept separate from the generic functionality which loads and merges data from CF NetCDF files based on CF metadata.

RHattersley

unread,
Nov 6, 2012, 6:07:28 AM11/6/12
to scitoo...@googlegroups.com
Hi Phil,

Thanks for taking the time to post.

Your comments tally with what we've seen elsewhere - i.e. it's not part of the CF conventions, but in "real life" the variable name is a very convenient way to select data. As such, the corresponding GitHub issue (https://github.com/SciTools/iris/issues/69) is currently scheduled for the next release. Please feel welcome to join in that discussion (or even submit a pull request).

Thanks,
Richard

pbentley

unread,
Nov 6, 2012, 6:27:40 AM11/6/12
to scitoo...@googlegroups.com
Hi Mark,

"...used by scientists as identifiers, instead of controlled vocabulary identifiers". Au contraire! The chosen short names are absolutely controlled vocabularies. Those names, like 'tas', are then used consistently across (in the case of CMIP5) thousands or even millions of individual data files to name variables. They have as much currency and practical use as standard names. Indeed, it would make good sense, IMHO, for the short names to be incorporated into the CF standard name table, though that's a separate issue.

As Richard mentions in his response, this is a common real-world use-case that we ought to expedite. Personally I don't believe it is reasonable to expect Iris users to concoct custom code - even if it is fairly minimal - to ingest and store an intrinsic (and fundamental) property of an input dataset.

Just my $0.02 worth :-)

Cheers,
Phil

rsignell

unread,
Jul 25, 2013, 9:19:25 AM7/25/13
to scitoo...@googlegroups.com
Iris Gang,

I still can't figure out the recommended way to read a cube by netcdf variable name, despite having read the recommended GitHub issue (https://github.com/SciTools/iris/issues/69),  the pull request that resolved it (https://github.com/SciTools/iris/pull/317), and the Iris documentation on constrained cube loading (http://scitools.org.uk/iris/docs/latest/userguide/loading_iris_cubes.html#constrained-loading).

Perhaps I just need another cup of coffee, but if someone could just give an example it would be greatly appreciated!

I doubt this is the best way to load the netCDF variable called "temp", right?

import iris
url
='http://geoport.whoi.edu/thredds/dodsC/examples/bora_feb.nc'
cubes
= iris.load(url)
temp
= cubes[where([cube.var_name=='temp' for cube in cubes])[0]]

Thanks,
Rich

Phil Elson

unread,
Jul 25, 2013, 9:30:09 AM7/25/13
to scitoo...@googlegroups.com
Hey Rich,

Good to meet up at SciPy'13 this year - I'm only just getting back to a degree of normality after taking such a long break but the conference was well worth it I think.

Anyway, looks like you need to know that you can pass an arbitrary cube level function to an iris.Constraint, and that arbitrary function operates on a single cube and must return either True false. So, with your dataset:

>> var_name_temp = iris.Constraint(cube_func=lambda cube: cube.var_name == 'temp')
>>> temp, = cubes.extract(var_name_temp)

Hopefully that does the trick. As for contents of the user guide - pull requests welcomed ;-)

Cheers,



Andrew Dawson

unread,
Jul 25, 2013, 9:30:51 AM7/25/13
to scitoo...@googlegroups.com
Looks like you are going for a list comprehension style statement, so I'll use one too:

temp = [cube for cube in cubes if cube.var_name == 'temp']

Alternatively, if you want to use the extract method of the CubeList (or even constrain at load time) you would define and use the constraint:

temp_constraint = iris.Constraint(cube_func=lambda c: c.var_name == 'temp')
temp
= cubes.extract(temp_constraint)

Hope that helps.

Andrew Dawson

unread,
Jul 25, 2013, 9:32:55 AM7/25/13
to scitoo...@googlegroups.com
You beat me to it Phil, good to see we have almost identical code though!
Reply all
Reply to author
Forward
0 new messages