collapsing a cube returns no data

Andy Hartley

unread,

Nov 12, 2015, 6:49:46 AM11/12/15

to Iris

Hi all,

I'm having a bit of trouble collapsing a cube over 2 dimensions (latitude and longitude). I just want to retain the time coordinate from the following cube:

print cube
Gridbox net primary productivity / (kg m-2 s-1) (time: 4; latitude: 87; longitude: 180)
     Dimension coordinates:
          time                                       x            -              -
          latitude                                   -            x              -
          longitude                                  -            -              x

The cube is part of larger cubelist that is saved to disk, and the data is a masked array.

I would like to do an area weighted global mean for each timestep, so that I can plot a timeseries for each cube in the cubelist on the same plot. Here's the code I'm using ...

cube = cubelist[0]
wghts = iris.analysis.cartography.area_weights(cube, normalize=False)
coords = ('longitude', 'latitude')
global_mean = cube.collapsed(coords, iris.analysis.MEAN, weights=wghts, returned=False)

print global_mean
Gridbox net primary productivity / (kg m-2 s-1) (time: 4)
     Dimension coordinates:
          time                                       x
     Scalar coordinates:
          latitude: -3.0 degrees, bound=(-90.0, 84.0) degrees
          longitude: 180.0 degrees, bound=(0.0, 360.0) degrees
     Cell methods:
          mean: longitude, latitude

print global_mean.data
[-- -- -- --]

print global_mean.data.mask
[ True  True  True  True]

So, the collapsing seems to work, but it returns a cube full of masked data. Incidentally, numpy does seem to get it correct (but this is not the answer I want, because I would like to do the area weighting):

np.nanmean(cube.data, axis=(1,2))

masked_array(data = [2.1867845660158735e-09 2.7183398537014987e-09 3.133103298674133e-09
 3.0635058163135963e-09],
             mask = [False False False False],
       fill_value = 1e+20)

Does anybody have any ideas on what I'm doing wrong?

Cheers,
Andy

Andrew Dawson

unread,

Nov 12, 2015, 8:02:34 AM11/12/15

to Iris

I suspect that the cube you are collapsing contains NaN values, rather than (or perhaps in addition to) being a masked array. How was the cube constructed originally? The MEAN aggregator using numpy.ma.average under-the-hood, which will produce this behaviour when NaN values are present in the input. Fore example

import numpy as np
import numpy.ma as ma

a1 = np.array([[1., 2., np.nan, 4., 5.]])
a2 = ma.array([[1., 2., 3., 4., 5.]], mask=[[0, 0, 1, 0, 0]])
print ma.average(a1, axis=1)
print ma.average(a2, axis=1)

which will print:

[--]
[3.0]

This seems consistent with your problem.

Since Iris pretty much assumes you are using masked arrays rather than NaN values to indicate missing values, my suggestion is that you first verify whether or not the data you are collapsing are contained in a masked array. Iris should always use masked arrays for data loaded from file (e.g. netCDF, GRIB) but it is possible to set the data of a cube manually to use a plain numpy including NaN values, which may occur in some user written analysis scripts. If you actually have a plain numpy array with NaNs you can convert it to a masked array with

import numpy.ma as ma
cube.data = ma.masked_invalid(cube.data)

There is a third option, which is a little more unpleasant to imagine, that is you have a masked array where some of the unmasked elements are actually NaN values... I'm not sure how this could arise but numpy does allow it. In this case you'd need to use masked_invalid also.

I think in general you should avoid NaN values when setting values contained in cubes, although there isn't a direct problem with this, a lot of the supplementary functionality assumes that missing values are indicated by masks and will break if NaN values are used instead.

Let us know how you get on.

Andy Hartley

unread,

Nov 12, 2015, 11:13:21 AM11/12/15

to Iris

Thanks for your help Andrew,

On reading your reply, I thought I had properly accounted for nan values, but it turns out that I didn't. I was using the jules package (based on iris) to load land points only netcdf data. Here's what I had (which produces the error):

cubes = jules.load(ifiles, constraint=constr, missingdata=-9999)
cube  = cubes.concatenate()[0]
cube.data = np.ma.masked_equal(cube.data, -9999)

But, if I remove the missingdata=-9999 from the load statement, and use your ma.masked_invalid suggestion instead, it works! Here's the solution ...

cubes = jules.load(ifiles, constraint=constr)
cube  = cubes.concatenate()[0]
cube.data = np.ma.masked_invalid(cube.data)



global_mean = cube.collapsed(coords, iris.analysis.MEAN, weights=wghts, returned=False)

print global_mean.data

[2.9504317198304564e-09 3.654033811403742e-09 4.2969998459284205e-09
 4.145463176483709e-09]

Thanks for the help. It's a bit of an NaN minefield out there!
Andy

Andrew Dawson

unread,

Nov 12, 2015, 11:23:42 AM11/12/15

to Iris

I don't know anything about the jules package, but perhaps you might want to give whoever maintains that a nudge to look into this, it might be a simple fix there so you don't have to apply any fixes after load.

Reply all

Reply to author

Forward