Time series data across multiple NetCDF files with different time units

1,227 views
Skip to first unread message

marqh

unread,
Nov 7, 2012, 3:49:48 AM11/7/12
to scitoo...@googlegroups.com
A collection of NetCDF files may contain time series data, where each file has set the 'time unit' with a different value.

A single logical dataset  exists, but the Iris merge process will not merge the data as it identifies each time Coordinate as defined differently.  Careful metadata manipulation by users is required to merge the loaded Cubes into a single Cube.

Is the definition of a datetime hampering the interpretation of this data?

pbentley

unread,
Nov 13, 2012, 10:51:06 AM11/13/12
to scitoo...@googlegroups.com
Hi Mark,

Slightly different, though related, I think, to this issue is the case where you have 2+ netcdf input files which contain a contiguous time series for a single variable. The files differ only in the time coordinate(s) and the actual data for the variable in question. In particular, the reference datum for the time coordinate (i.e. "time units since ref_time") is identical for each input file.

When I load a collection of such files into Iris I end up with a separate cube for each file, rather than the single cube I might have expected. The following example is for 3 consecutive month's worth of temperature data. All the files have time:units = "days since 1859-12-01".

>>> cubes = iris.load('tas_*_2006.nc')
>>> print cubes
0: air_temperature                     (time: 1; latitude: 145; longitude: 192)
1: air_temperature                     (time: 1; latitude: 145; longitude: 192)
2: air_temperature                     (time: 1; latitude: 145; longitude: 192)

I suspect there is some not-yet-learned trick that I can use (e.g. a callback) to achieve the desired outcome :-)

Regards,
Phil

bjlittle

unread,
Dec 13, 2012, 6:04:26 AM12/13/12
to scitoo...@googlegroups.com
Hi Phil,

To achieve the desired result, you require to perform the following:

import iris

def callback(cube, field, filename):
    # Remove the history attribute.
    del cube.attributes['history']
   
cubes = iris.load('*.nc3', callback=callback)
cube_list = iris.cube.CubeList()
for cube in cubes:
    # Slice time to a scalar coordinate.
    cube_list.append(cube[0])
 
# Merge will now rebuild the time dimension.
cube = cube_list.merge()[0]

There are a couple of issues here:
  1. Each Cube contains a different history within the Cube attributes dictionary. Cube merge is strict and unforgiving in this respect and as such this difference will cause it to consider each of the three Cubes as separate and unmergeable. Using the callback mechanism to remove the history from the Cube attributes dictionary resolves this issue.
  2. Each of the three Cubes passed to Cube merge is a 3d (t, y, x) Cube. The single point (and bounds) value associated with each Cube time dimension coordinate is different. As the time coordinate is a vector coordinate (albeit single valued) and not a scalar coordinate, Cube merge considers each of the three Cubes as separate and unmergeable.
  3. Currently Cube merge only creates new dimensions based on common scalar coordinates. It cannot (at the moment) merge on a common existing dimension. Thus slicing the Cube on the single valued time dimension, collapses the time dimension from the Cube and demotes the single valued time dimension coordinate to an auxiliary scalar coordinate. Now that the time coordinate is scalar, Cube merge can now create a new time dimension based on the scalar time coordinates of each Cube.

The resulting Cube, is now as expected:

>>> print cube
air_temperature                     (time: 3; latitude: 145; longitude: 192)
     Dimension coordinates:
          time                           x            -               -
          latitude                       -            x               -
          longitude                      -            -               x
     Scalar coordinates:
          height: 1.5 m
     Attributes:
          NCO: 4.1.0
          original_name: mo: m01s03i236
     Cell methods:
          mean: time

Clearly this example exposes an area where Cube merge requires to be extended i.e. merging similar Cubes over a common and an already existing dimension.

My expectation is that users from the netCDF community would benefit greatest once this capability is available.

Hope this helps.

Regards,

-Bill


pbentley

unread,
Dec 19, 2012, 8:37:39 AM12/19/12
to scitoo...@googlegroups.com
Hi Bill,

Thanks for chasing down this workaround. As per our offline dialogue, here are some follow-up thoughts on this particular issue (and within the specific context of netcdf input sources):

By automatically inheriting global attributes, I think this is likely to give rise to cube merge issues in the majority of cases since there's usually at least one attribute (often the history attribute) that differs across a set of input files. I presume that global attributes are attached to cubes because, as free-standing objects - cubes don't have a higher-level container object to which global atts might be attached. Is that right?

In which case, what scope is there, I wonder, for maintaining global attributes in some other namespace/container, rather than assigning them to individual cubes? If that's not possible then, on a per-cube, basis one could separate out global attributes into a dedicated instance attribute, thus:

cube.attributes   # local attributes, as at presernt
cube
.globals      # or cube.global_attributes

With the attributes so distinguished, merge operations could then determine genuine cube differences based upon their local attributes, without pollution or interference from global attributes which might have no relevant to a particular merge (or merge-type) operation.

Alternatively, or as an adjunct to the above ideas, one might consider introducing some kind of 'lax' keyword to the merge operation interface. The default would still be to follow the current 'strict' rules; requesting the 'lax' option would be an instruction to be more flexible (TBD) about merge constraints.

Phil

pbentley

unread,
Dec 19, 2012, 11:07:05 AM12/19/12
to scitoo...@googlegroups.com
Sheesh, how many typos can I get in one post. Must do better! P.

RHattersley

unread,
Jan 2, 2013, 5:18:02 AM1/2/13
to scitoo...@googlegroups.com
Back when we were designing the current iteration of Iris, our reading of the CF and netCDF documentation led us to conclude that the CF conventions intend for global attributes to be treated as equivalent to local attributes. In other words, global attributes can be used to improve readability and a reduce duplication, but they denote exactly the same meaning. I note that David Hassell seems to have made the same interpretation in his cf-python module.

This point is also covered in the ongoing CF data model discussion at https://cf-pcmdi.llnl.gov/trac/ticket/68. Perhaps real-world usage is suggesting that the two are not exactly equivalent? Would you consider posting on that ticket to suggest a distinction needs to be included in the model?

Richard

pbentley

unread,
Jan 7, 2013, 8:20:49 AM1/7/13
to scitoo...@googlegroups.com
If the received wisdom is for cubes to inherit global metadata by default then I wouldn't wish to argue against that (for a change :-).

But in that case, what would be useful, I believe, is to support an additional keyword in the load_cube/s functions to disable this behaviour if so desired. For example:

cubes = iris.load_cubes(filenames, callback, inherit_global_metadata=False)

With such a mechanism the user then has some degree of control over the metadata content of a cube. It would certainly be of utility in the current scenario wherein one has to delete (some) global attributes in order for merge operations to succeed.

Just a thought.

Phil

Martin Dix

unread,
Jan 9, 2013, 12:58:24 AM1/9/13
to scitoo...@googlegroups.com
From a user perspective I'd certainly be confused if merging didn't work because of something as essentially irrelevant as a time stamp in a history attribute.

However it's not just a problem with the global metadata. CMIP5 netCDF files have a history attribute with a time stamp on the variable as well as a global history attribute.

Martin Dix

unread,
Jan 9, 2013, 1:19:21 AM1/9/13
to scitoo...@googlegroups.com
Bill's scheme for merging works for me when there's only a single time in each file but fails when the files have multiple times (e.g. merging files containing a month of daily values). A cut-down example uses two files each with two times.

cubes = iris.load(['tas_12.nc', 'tas_34.nc'], callback=callback)

cube_list = iris.cube.CubeList()
for cube in cubes:
    for k in range(cube.shape[0]):
        cube_list.append(cube[k])
 
cube = cube_list.merge()[0]


The resulting cube has only the first time value from each file although cube_list has the expected values

>>> for c in cube_list:
...   print c.coords('time')
...
[DimCoord(array([ 730119.5]), bounds=array([[ 730119.,  730120.]]), standard_name=u'time', units=Unit('days since 0001-01-01', calendar='proleptic_gregorian'), long_name=u'time')]
[DimCoord(array([ 730120.5]), bounds=array([[ 730120.,  730121.]]), ...
[DimCoord(array([ 730121.5]), bounds=array([[ 730121.,  730122.]]), ...
[DimCoord(array([ 730122.5]), bounds=array([[ 730122.,  730123.]]), ...

>>> print cube.coords('time')
[DimCoord(array([ 730119.5,  730121.5]), bounds=array([[ 730119.,  730120.],
       [ 730121.,  730122.]]), standard_name=u'time', units=Unit('days since 0001-01-01', calendar='proleptic_gregorian'), long_name=u'time')]

Reply all
Reply to author
Forward
0 new messages