Writing a cube to netcdf

949 views
Skip to first unread message

Rutger Dankers

unread,
Nov 9, 2012, 11:27:39 AM11/9/12
to scitoo...@googlegroups.com
Hello

Writing an IRIS cube to netcdf seems to work fine for 3-D cubes. However, if you calculate (for example) a time-average and want to write the resulting 2-D field to file, you will find that the resulting netcdf is probably not as expected, and may give errors in other applications. For example:

>>> mycube.shape
(365, 360, 720)
>>> mycube_avg = mycube.collapsed('time', iris.analysis.MEAN)
>>> mycube_avg.shape
(360, 720)
>>> iris.io.save(mycube, "mycube.nc", netcdf_format="NETCDF3_CLASSIC")
>>> iris.io.save(mycube_avg, "mycube_avg.nc", netcdf_format="NETCDF3_CLASSIC")


The first file is absolutely fine. The second file will give errors, for example when using it with CDO. This is because, in the absence of a time dimension, latitude has become the unlimited dimension:

$ ncdump -h mycube.nc
netcdf mycube
{
dimensions
:
        time
= UNLIMITED ; // (365 currently)
        latitude
= 360 ;
        longitude
= 720 ;

$ ncdump
-h mycube_avg.nc
netcdf mycube_avg
{
dimensions
:
        latitude
= UNLIMITED ; // (360 currently)
        longitude
= 720 ;
        bnds
= 2 ;


Note that mycube_avg still has time information, but as scalar coordinate (instead of dimension coordinate).

Carwyn Pelley pointed out that that the outer dimension of the cube written to netcdf is given an unlimited size (as expected). This allows the concatenation of files.

But how can you reshape mycube_avg so that it has time as outer dimension (with length 1)? Turns out you need to construct a new cube with reshaped data, as data within a cube is immutable. Fortunately you can (apparently) simply copy the dimension coordinates and attributes of the original cube:

time_coord = mycube_avg.coord('time')
lat_coord
= mycube_avg.coord('latitude')
lon_coord
= mycube_avg.coord('longitude')

newcube
= iris.cube.Cube(ma.reshape(mycube_avg.data,[1,360,720]), long_name='my long name')
newcube
.add_dim_coord(time_coord,0)
newcube
.add_dim_coord(lat_coord,1)
newcube
.add_dim_coord(lon_coord,2)
newcube
.attributes = mycube_avg.attributes
iris
.io.save(newcube, "newcube.nc", netcdf_format="NETCDF3_CLASSIC")

This will result in a netcdf file as expected, with an unlimited dimension "time" of length 1.

Nevertheless, one would expect this to be a bit more intuitive...

Cheers
Rutger

Carwyn Pelley

unread,
Nov 13, 2012, 11:40:44 AM11/13/12
to scitoo...@googlegroups.com
It might be preferable to have control over the outer dimension.
Though it is perfectly cf-compliant to have the time as a scalar coordinate and thus the latitude as an outer dimension, would it not be suitable place for a keyword for choosing whether your collapsed coordinate should be a dimension coordinate (length 1).
Its far more likely that concatenation will occur between netCDF files of different times than of different latitudes.

RHattersley

unread,
Nov 13, 2012, 12:11:42 PM11/13/12
to scitoo...@googlegroups.com
> data within a cube is immutable.

The exception is cube.transpose() which allows one to shuffle data dimensions.

Similarly, we could conceivably have cube.add_dimension() which would allow one to add a new dimension of length one, and optionally to promote one or more scalar coordinates to that dimension. In your example, that would allow you to do:

mycube_avg.add_dimension(0, 'time')

The opposite direction could also be supported.

marqh

unread,
Nov 14, 2012, 10:25:38 AM11/14/12
to scitoo...@googlegroups.com
I think this would be a useful feature

Andrew Dawson

unread,
Nov 14, 2012, 11:00:40 AM11/14/12
to scitoo...@googlegroups.com
I also think this would be an extremely good feature. I would get a lot of mileage out of it.

rsignell

unread,
Jun 13, 2013, 5:57:01 PM6/13/13
to scitoo...@googlegroups.com
I'm trying to write a 4D cube shape=(1, 40, 24, 28) to netcdf, but having problems:

http://nbviewer.ipython.org/5777643

Am I having the same problem discussed here?
I didn't think so.  What am I missing?

What I would like to do is create a new NetCDF file with unlimited time, and then keep adding additional time steps to it.   I need to loop through a lot of remote GRIB files served via OPeNDAP (and subsetting each one for the Gulf of Maine region).

Thanks,
Rich

bjlittle

unread,
Jun 25, 2013, 7:24:19 PM6/25/13
to scitoo...@googlegroups.com
An extension to the experimental cube concatenate functionality requires the ability to promote a scalar coordinates to be a dimension coordinate.

Ed Campbell has some neat cleverness in a development branch that does just this.

There seems to be a need for this kinda functionality ... I'll give him a poke, perhaps he can fold this into a PR.

esc24

unread,
Jun 26, 2013, 5:09:27 AM6/26/13
to scitoo...@googlegroups.com
I have written a function that attempts to infer which dimensions have been dropped and returns a cube with the length one dimensions restored, reshaping the data and the relevant coordinates. It needs some tests before I could submit a PR. I'm also not sure where it might go or whether we'd be better off giving an option not to drop the dimensions in the first place. It's called add_missing_dims (not a good name - I'd welcome suggestions). Signature and docstring below:

def add_missing_dims(collapsed_cube, cube):
"""
Return a cube that has length one dimensions for those that have been
collapsed, aggregated or sliced out.
 
Args:
 
* collapsed_cube:
An instance of :class:`iris.cube.Cube`.
 
* cube:
The :class:`iris.cube.Cube` from which the `collapsed_cube`
originated.
 
Returns:
A instance of :class:`iris.cube.Cube` with the same dimensionality as
`cube` but with the data and coordinates from `collapsed_cube` suitably
reshaped to fit.
 
"""

pbentley

unread,
Jul 30, 2013, 10:51:08 AM7/30/13
to scitoo...@googlegroups.com
Hi,

I hadn't realised that Iris set the outermost dimension as unlimited by default when it writes cubes to a netcdf file. I've just run up against this while saving a 2D field - dimensioned (lat, lon) - to a netcdf file. The lat dimension is set to be unlimited, which is not what I want.

I'm not convinced this is desirable default behaviour. Personally, I think it would be preferable if there was a keyword argument (or somesuch) which allowed the user/caller to request this behaviour explicitly.

However, that's not an argument I'm going to win, so instead I'll ask this question:

How can I save my cube(s) to a netcdf file and have all the dimensions set to be fixed size rather than unlimited?

Phil

rsignell

unread,
Jul 30, 2013, 11:12:46 AM7/30/13
to scitoo...@googlegroups.com
Phil,

One way would be to add a singleton time dimension, right?

-Rich

pbentley

unread,
Jul 31, 2013, 5:24:17 AM7/31/13
to scitoo...@googlegroups.com
Hi Rich,

Maybe so. But I don't think that will resolve the underlying issue. Which is that when you introduce an unlimited dimension (and at the same time neglect to set correct chunking parameters for key variables - as is the case with Iris at present, I believe) the netcdf file size can end up significantly larger than it need be. My own limited and unscientific tests appear to show that the size difference can be O(2-3x).

There was an internal (to UKMO) newsgroup discussion thread about this recently. It's not possible to link to that particular thread from this forum, but a similar discussion can be viewed here (I expect google would uncover other examples). In both cases, though, the recommendation was to convert unlimited dimensions to fixed size.

Bottom line, IMHO, is that unless you have a specific need to have an unlimited dimension, then it's preferable - as a default starting position - to use fixed-size dimensions. It's easy enough to make a dimension unlimited when you need to (e.g. using ncks).

In the case of familiar (and uber-abundant) 2D datasets based on (lat,long) you'd typically want the dimensions to be fixed-size and the coordinate variables to use contiguous rather than chunked storage. I think!

Phil

rsignell

unread,
Jul 31, 2013, 6:51:14 AM7/31/13
to scitoo...@googlegroups.com
 Phil,

Excellent points.  For folks interested in chucking issues, I thought Russ Rew's blog posts were great:

http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters
http://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_choosing_shapes

and if you are really into it, I found this study fascinating, comparing compression used in NetCDF4, GRIB and HDF5:

http://www.ral.ucar.edu/~steves/compStudy.pdf

A few interesting things I learned there:
- the standard deflate method in NetCDF4 is LZ77, which actually doesn't even know about numbers.  It just treats floating point numbers as individual bytes, like arbitrary text.  
- for numbers, SZIP is superior to LZ77 in every way (size, speed of compression, speed of decompression), but it's not available for writing in NetCDF4 due to license issues.
- if you don't need the full precision of float, limit the number of significant digits, which really helps deflation.

-Rich

pbentley

unread,
Jul 31, 2013, 9:40:39 AM7/31/13
to scitoo...@googlegroups.com
Rich,

Apropos SZIP, DKRZ have developed a plug-compatible open source alternative called libaec. It's on my ever-expanding list of things to investigate and test out ;-)

For those who are interested, the project page is at https://www.dkrz.de/redmine/projects/aec/wiki

Phil

Phil Elson

unread,
Aug 2, 2013, 4:05:40 AM8/2/13
to scitoo...@googlegroups.com
FWIW, I'm +1 for having a more sensible approach to specifying unlimited dimension(s?) - I wonder how often an unlimited dimension is really required? A keyword doesn't sound like a horrendous idea to me.

Luke Abraham

unread,
Oct 9, 2013, 11:38:42 AM10/9/13
to scitoo...@googlegroups.com
Hi,

I'm wondering if any further progress has been made on this issue since it was originally discussed above? I'm coming across the same problem in iris 1.4. Has the functionality mentioned by bjlittle been implemented in a later version, or is the add_missing_dims function now part of iris proper? I also think that the point by pbentley about the unlimited dimension (also mentioned below) is a bug and needs to be fixed (if it hasn't already). I also have a couple of points/questions/suggestions at the end of my post (sorry for the length!).

The essence of my problem is that if I load multiple pp-format files using iris I get a cube which looks like

mass_fraction_of_ozone_in_air / 1   (time: 12; model_level_number: 60;
latitude
: 73; longitude: 96)
     
Dimension coordinates:
          time                           x        
-       -       -
          model_level_number            
-        x       -       -
          latitude                      
-        -       x       -
          longitude                      
-        -       -       x
     
Auxiliary coordinates:
          forecast_period                x        
-       -       -
          level_height                  
-        x       -       -
          sigma                          
-        x       -       -
     
Scalar coordinates:
          forecast_reference_time
: 2160-12-01 00:00:00
     
Attributes:
          STASH
: m01s34i001
          conversion_factor
: 1.657
          source
: Data from Met Office Unified Model 7.03
     
Cell methods:
          mean
: time (1 hour)




When saving this field to netCDF I get a file which looks like:

netcdf ozone {
dimensions
:
        time
= UNLIMITED ; // (12 currently)
        model_level_number
= 60 ;
        latitude
= 73 ;
        longitude
= 96 ;
        bnds
= 2 ;
variables
:
       
float mass_fraction_of_ozone_in_air(time, model_level_number, latitude, longitude) ;
                mass_fraction_of_ozone_in_air
:standard_name = "mass_fraction_of_ozone_in_air" ;
                mass_fraction_of_ozone_in_air
:long_name = "O3 MMR in kg(O3)/kg(air)" ;
                mass_fraction_of_ozone_in_air
:units = "1" ;
                mass_fraction_of_ozone_in_air
:ukmo__um_stash_source = "m01s34i001" ;
                mass_fraction_of_ozone_in_air
:conversion_factor = 1.657 ;
                mass_fraction_of_ozone_in_air
:source = "Data from Met Office Unified Model 7.03" ;
                mass_fraction_of_ozone_in_air
:cell_methods = "time: mean (interval: 1 hour)" ;
                mass_fraction_of_ozone_in_air
:grid_mapping = "latitude_longitude" ;
                mass_fraction_of_ozone_in_air
:coordinates = "forecast_period forecast_reference_time level_height sigma" ;
       
int latitude_longitude ;
                latitude_longitude
:grid_mapping_name = "latitude_longitude" ;
                latitude_longitude
:longitude_of_prime_meridian = 0. ;
                latitude_longitude
:semi_major_axis = 6371229. ;
                latitude_longitude
:semi_minor_axis = 6371229. ;
       
double time(time) ;
                time
:axis = "T" ;
                time
:bounds = "time_bnds" ;
                time
:units = "hours since 1970-01-01 00:00:00" ;
                time
:standard_name = "time" ;
                time
:calendar = "360_day" ;
       
double time_bnds(time, bnds) ;
       
int model_level_number(model_level_number) ;
                model_level_number
:axis = "Z" ;
                model_level_number
:units = "1" ;
                model_level_number
:standard_name = "model_level_number" ;
                model_level_number
:positive = "up" ;
       
float latitude(latitude) ;
                latitude
:axis = "Y" ;
                latitude
:units = "degrees_north" ;
                latitude
:standard_name = "latitude" ;
       
float longitude(longitude) ;
                longitude
:axis = "X" ;
                longitude
:units = "degrees_east" ;
                longitude
:standard_name = "longitude" ;
       
int forecast_period(time) ;
                forecast_period
:units = "hours" ;
                forecast_period
:standard_name = "forecast_period" ;
       
double forecast_reference_time ;
                forecast_reference_time
:units = "hours since 1970-01-01 00:00:00" ;
                forecast_reference_time
:standard_name =
"forecast_reference_time" ;
                forecast_reference_time
:calendar = "360_day" ;
       
float level_height(model_level_number) ;
                level_height
:bounds = "level_height_bnds" ;
                level_height
:units = "m" ;
                level_height
:long_name = "level_height" ;
                level_height
:positive = "up" ;
       
float level_height_bnds(model_level_number, bnds) ;
       
float sigma(model_level_number) ;
                sigma
:bounds = "sigma_bnds" ;
                sigma
:units = "1" ;
                sigma
:long_name = "sigma" ;
       
float sigma_bnds(model_level_number, bnds) ;

// global attributes:
               
:Conventions = "CF-1.5" ;
}


which is all fine, and as to be expected.

However, when loading a single file, the time dimension is not included as a 'Dimension' - it now appears as a 'Scalar Coordinate'

mass_fraction_of_ozone_in_air / 1   (model_level_number: 60; latitude: 73;
longitude
: 96)
     
Dimension coordinates:
          model_level_number             x      
-        -
          latitude                      
-       x        -
          longitude                      
-       -        x
     
Auxiliary coordinates:
          level_height                   x      
-        -
          sigma                          x      
-        -
     
Scalar coordinates:
          forecast_period
: 6074640 hours
          forecast_reference_time
: 2160-12-01 00:00:00
          time
: 2863-12-16 00:00:00, bound=(2863-12-01 00:00:00, 2864-01-01 00:00:00)
     
Attributes:
          STASH
: m01s34i001
          conversion_factor
: 1.657
          source
: Data from Met Office Unified Model 7.03
     
Cell methods:
          mean
: time (1 hour)




Perhaps this is as expected. I have noticed that 2D fields such as surface temperature are not given a model_level_number. However, it does have knock-on consquences for saving this field to netCDF:

netcdf ozone {
dimensions
:
        model_level_number
= UNLIMITED ; // (60 currently)
        latitude
= 73 ;
        longitude
= 96 ;
        bnds
= 2 ;
variables
:
       
float mass_fraction_of_ozone_in_air(model_level_number, latitude, longitude) ;
                mass_fraction_of_ozone_in_air
:standard_name = "mass_fraction_of_ozone_in_air" ;
                mass_fraction_of_ozone_in_air
:long_name = "O3 MMR in kg(O3)/kg(air)" ;
                mass_fraction_of_ozone_in_air
:units = "1" ;
                mass_fraction_of_ozone_in_air
:ukmo__um_stash_source = "m01s34i001" ;
                mass_fraction_of_ozone_in_air
:conversion_factor = 1.657 ;
                mass_fraction_of_ozone_in_air
:source = "Data from Met Office Unified Model 7.03" ;
                mass_fraction_of_ozone_in_air
:cell_methods = "time: mean (interval: 1 hour)" ;
                mass_fraction_of_ozone_in_air
:grid_mapping = "latitude_longitude" ;
                mass_fraction_of_ozone_in_air
:coordinates = "forecast_period forecast_reference_time level_height sigma time" ;
       
int latitude_longitude ;
                latitude_longitude
:grid_mapping_name = "latitude_longitude" ;
                latitude_longitude
:longitude_of_prime_meridian = 0. ;
                latitude_longitude
:semi_major_axis = 6371229. ;
                latitude_longitude
:semi_minor_axis = 6371229. ;
       
int model_level_number(model_level_number) ;
                model_level_number
:axis = "Z" ;
                model_level_number
:units = "1" ;
                model_level_number
:standard_name = "model_level_number" ;
                model_level_number
:positive = "up" ;
       
float latitude(latitude) ;
                latitude
:axis = "Y" ;
                latitude
:units = "degrees_north" ;
                latitude
:standard_name = "latitude" ;
       
float longitude(longitude) ;
                longitude
:axis = "X" ;
                longitude
:units = "degrees_east" ;
                longitude
:standard_name = "longitude" ;
       
int forecast_period ;
                forecast_period
:units = "hours" ;
                forecast_period
:standard_name = "forecast_period" ;
       
double forecast_reference_time ;
                forecast_reference_time
:units = "hours since 1970-01-01 00:00:00" ;
                forecast_reference_time
:standard_name = "forecast_reference_time" ;
                forecast_reference_time
:calendar = "360_day" ;
       
float level_height(model_level_number) ;
                level_height
:bounds = "level_height_bnds" ;
                level_height
:units = "m" ;
                level_height
:long_name = "level_height" ;
                level_height
:positive = "up" ;
       
float level_height_bnds(model_level_number, bnds) ;
       
float sigma(model_level_number) ;
                sigma
:bounds = "sigma_bnds" ;
                sigma
:units = "1" ;
                sigma
:long_name = "sigma" ;
       
float sigma_bnds(model_level_number, bnds) ;
       
double time ;
                time
:bounds = "time_bnds" ;
                time
:units = "hours since 1970-01-01 00:00:00" ;
                time
:standard_name = "time" ;
                time
:calendar = "360_day" ;
       
double time_bnds(bnds) ;

// global attributes:
               
:Conventions = "CF-1.5" ;
}



When writing out the netCDF file, iris is assuming that the model_level_number is the record dimension - I guess it does that because the netCDF writing routines assume the 0 dimension to always be time, but as iris will read a single file without giving it a time dimension, this then breaks on the iris.save. I'm also guessing that this is a bug and not as intended.

This then leads to the following questions:

  1. Is it possible to add in a time dimension on a single file read, either:
    1.  after the cube has been loaded, or
    2. during the iris.load command itself?
  2. If the answer to 1 is "no", is it possible to add the time dimension on the iris.save command?
  3. Extending the above questions to more than the record dimension, is it possible to force a 4-Dimensional field, e.g. for a surface field, make the number of model levels to be 1 (rather than not being included), or for a zonal mean field having the number of longitude points set to be 1 etc.? This then makes it easier to write code to analyse and manipulate fields, as the fields could be forced to have the dimensions (time,model_level_number,latitude,longitude), rather than this being different for different fields. I know that there could be issues with having a degenerate dimension (or dimensions).
The lack of a (correct) record dimension is quite serious, as it means that files cannot be catted together later. Unlimited dimensions are certainly required for the time (record) dimension, as without this, files cannot be concatenated by e.g. ncrcat or cdo mergetime etc.

Perhaps it could be possible to have a keyword on the iris.load call, e.g. force_dim=, which could have the following values: None could mean as it is now (the default), time just mean add a single time dimension, and 4d add-in any other dimensions, e.g.  time, height, or longitude etc.

Many thanks,
Luke



Richard Hattersley

unread,
Oct 14, 2013, 9:07:06 AM10/14/13
to scitoo...@googlegroups.com
Hi Luke,

I've been pondering this and related issues, and am currently of the opinion that we need to do several things:
 - Promote the use of format-specific save functions, with the ability to pass format-specific arguments (such as the coordinate(s) to create as "unlimited").
   - For a Cube with a scalar "time" coordinate, passing "time" as an unlimited-dimension could ensure the scalar coordinate is promoted to a normal coordinate with length one.
 - Stop the netCDF saver from assuming the outermost dimension should always be unlimited.
 - Simplify iris.save() so that it only has two arguments: cube(s) and a filename. (Alternatively we could just deprecate it.) More sophisticated requirements can use the format-specific save functions.

Martin Dix

unread,
Oct 15, 2013, 6:36:54 PM10/15/13
to scitoo...@googlegroups.com
If there's a format specific save function, it would be useful to have a way to set the netCDF4 packing and chunking options.

Luke Abraham

unread,
Oct 16, 2013, 4:11:40 AM10/16/13
to scitoo...@googlegroups.com
Hi Richard,

Those all sound like very sensible suggestions. I suspect that reading-in pp and writing out netCDF will be a major use of how I use iris (I am already planning on using it for our CCMI submission in the coming weeks/months) and probably how others use it as well. It would be good to have the netCDf-writing as flexible and bug-free as possible.

Do you know when these possible developments might be ready for use/release? I am happy to try testing any new routines.

I think that so long as the netCDF write from iris.save() only makes any time dimension unlimited, it might be OK to keep, at least for backwards compatibility purposes. Of course, it would mean more work to maintain it, and so moving to format specific saves might be better.

Many thanks,
Luke
Reply all
Reply to author
Forward
0 new messages