subtracting cubes with repreating dimensions e.g. moth of year

740 views
Skip to first unread message

Niall Robinson

unread,
Mar 25, 2013, 7:39:22 AM3/25/13
to scitoo...@googlegroups.com
Hello iris,

I'm currently working on the best way to apply a climatological average to some data. If I, say aggregate a dataset by month of year to create a climatological average, how can I then take this away from the relevant months in a data set? I've tried adding a coord_classification to the data set so that it is also classified by month. I also removed any extraneous dimensions from the climatology i.e. time and forecast_period, to try and avoid unnecessary conflicts with the data set cube. However, when I try and subtract these two cubes I get this error
>>> clim
precipitation_flux
/ kg m-2 s-1     (*ANONYMOUS*: 12; latitude: 145; longitude: 192)
     
Dimension coordinates:
          latitude                              
-             x               -
          longitude                            
-             -               x
     
Auxiliary coordinates:
          month                                 x            
-               -
     
Scalar coordinates:
          forecast_reference_time
: 1959-09-01 00:00:00
     
Attributes:
          STASH
: m01s05i216
          history
: Mean of precipitation_flux aggregated over month
          source
: Data from Met Office Unified Model 7.07
     
Cell methods:
          mean
: time (1 hour)
          mean
: month

>>>data
precipitation_flux
/ kg m-2 s-1     (time: 510; latitude: 145; longitude: 192)
     
Dimension coordinates:
          time                           x              
-               -
          latitude                      
-              x               -
          longitude                      
-              -               x
     
Auxiliary coordinates:
          forecast_period                x              
-               -
          month                          x              
-               -
     
Scalar coordinates:
          forecast_reference_time
: 1959-09-01 00:00:00
     
Attributes:
          STASH
: m01s05i216
          source
: Data from Met Office Unified Model 7.07
     
Cell methods:
          mean
: time (1 hour)

>>>data -= clim

Traceback (most recent call last):
 
File "/opt/eclipse/3.7/dropins/pydev/plugins/org.python.pydev.debug_2.5.0.2012040618/pysrc/pydevd_comm.py", line 755, in doIt
    result
= pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
 
File "/opt/eclipse/3.7/dropins/pydev/plugins/org.python.pydev.debug_2.5.0.2012040618/pysrc/pydevd_vars.py", line 382, in evaluateExpression
   
Exec(expression, updated_globals, frame.f_locals)
 
File "/opt/eclipse/3.7/dropins/pydev/plugins/org.python.pydev.debug_2.5.0.2012040618/pysrc/pydevd_exec.py", line 2, in Exec
   
exec exp in global_vars, local_vars
 
File "<string>", line 1, in <module>
 
File "/project/avd/iris/live/testing/iris/cube.py", line 2090, in __sub__
   
return iris.analysis.maths.subtract(self, other, ignore=True)
 
File "/project/avd/iris/live/testing/iris/analysis/maths.py", line 189, in subtract
    cube
, other, dim=dim, ignore=ignore, update_history=update_history, in_place=in_place)
 
File "/project/avd/iris/live/testing/iris/analysis/maths.py", line 289, in _add_subtract_common
   
'which cannot be ignored.' % ', '.join({coord_grp.name() for coord_grp in bad_coord_grps}))
ValueError: This operation cannot be performed as there are differing coordinates (forecast_period, month, time) remaining which cannot be ignored.
What is the best way to subtract data like this

Thanks
Niall

pp-mo

unread,
Mar 25, 2013, 1:58:28 PM3/25/13
to scitoo...@googlegroups.com
Interesting.  I can't get Iris to subtract a mean (without the time coordinate) from a cube which has a time coordinate.

You can use numpy to perform the required calculation on the data arrays, of course.
I did this ...
data_by_months = data.copy()
iris
.coord_categorisation.add_month_number(data_by_months, 'time')
i_months
= data_by_months.coord('month').points - 1  # -1 because months are 1..12
monthly_anoms
= data.copy()
monthly_anoms
.data -= clim.data[month_inds]

This uses the numpy array-of-indexes strategy.  
It works, but I'm then not quite sure what the history ought to look like.

When subtracting the arrays, the extra dimension of 'data' just results in a  broadcasting of the 'clim' array.
But in Iris the equivalent cubes are not compatible, because ones has an extra dimension associated with the 'time' coordinate.

However, you can remove the 'time' coord, and simply put it back again,....

monthly_anoms = data.copy()
data_time_coord
= monthly_anoms.coord('time')
monthly_anoms
.remove_coord(data_time_coord)
monthly_anoms
-= clim[month_inds]
monthly_anoms
.add_dim_coord(data_time_coord, 0)

Amazingly, this works, but I have to say that seems a slightly crazy approach. 
I'm not yet quite sure of the logic to this, or whether it could be fixed up.
I hope to send more when I've looked into it.

pp-mo

unread,
Mar 25, 2013, 2:01:01 PM3/25/13
to scitoo...@googlegroups.com
This uses the numpy array-of-indexes strategy.  
Correction : should have said  'list-of-indexes'

Niall Robinson

unread,
Mar 26, 2013, 5:24:52 AM3/26/13
to scitoo...@googlegroups.com
This is similar to the approach I ended up coding yesterday (below) - although I'm sure using a list of indices will be faster. The other problem/feature I ran into was this: If you want to generalise this approach and take a slice of the cube that is defined at runtime buy an index (i.e. when iterating over a dimension by index number to change one "slice" at a time) then you can't use a python slice() object on a cube, you have to use this on the data attribute, which may be a feature, but it seems like it wouldn't do any hard to let it operate on a cube too. This goes back to something I mentioned before on this forum about how iterating over a cube returns a copy of each slice rather than the slice itself, which makes doing stuff like the below messier that it could be. But I guess that is slightly off topic currently
res = "day_of_year"

dimNo
= cube.coord_dims(self.cube.coord(res))[0]
dimLen
= cube.shape[dimNo]            

dataIndex
= [slice(None) for _ in range(len(cube.shape))]
climIndex
= [slice(None) for _ in range(len(cube.shape))]

for i in range(dimLen):
            dataIndex
[dimNo] = slice(i, i+1)
            idx
= np.where(climmc.points == datamc.points[i])[0][0]
            climIndex
[dimNo] = slice(idx, idx+1)  
            cube
.data[dataIndex] -= climCube.data[climIndex]        


Niall Robinson

unread,
Mar 26, 2013, 5:26:02 AM3/26/13
to scitoo...@googlegroups.com
p.s. was too busy thinking about what you said and forgot to say - thanks for having a look at this!

pp-mo

unread,
Mar 27, 2013, 8:54:59 AM3/27/13
to scitoo...@googlegroups.com
I've had another thought about this now (and chat with @esc24) , and I'm changing my mind on what this is actually about.
It is true that cubes are not 'writeable' : i.e.
.. they don't have a __setitem__ method
.. so you can't write 'cube[i,j] = newdata'

But then, you might ask, what would you set part of a cube to ?

Logically, slicing or indexing a cube yields another cube, which is 'part of' the parent, containing a section of its data and its coordinates (plus all other metadata).
As part of a cube is a cube, the only sensible thing you could write into this would be another cube.
However, it makes no general sense to overwrite 'parts' of coordinates, as there are tight validity constraints on the values.
So basically, it would make no sense to assign a cube into part of another cube, unless it had the same (part-)coordinates, which basically means it could not change the coordinate values.
( Similarly, writing one cube into 'part' of another would only makes sense if the units are the same )

So it seems that the only thing about a cube that can sensibly be 'partly changed' is the data.
So .. you can simply assign to the data instead.
There is really nothing wrong with this.  The data is a mutable part of the cube which you have full access to.
The only catch is that you must not change it's shape.  Unfortunately we have not managed to come up with a clean way of preventing that, so this remains a unenforced "convention".

I think this also shows why when you do index or slice cubes, the result has to be a new, independent cube with its own, separate coordinates :  It can't be a 'view' on the old one, because then any changes to its metadata could make nonsense of the parent.
I'm still wondering, though, whether we could preserve a view on the original cube's data, allowing you to manipulate data in an iteration over slices.
At present, as I wrote in an earlier, offline conversation :
When subindexing, extraction or slicing cubes, all data is actually copied, so that the resulting subcubes are always entirely separate from the original.
This is much simpler, and quite intentional.
It is true that in in principle it would be possible to yield a 'sub view' of the data, at least for simple slicing.  That is what NumPy does (at least for simple indexing), so you might expect that
But we haven't implemented that because it was felt that copying is more straightforward, and having different cubes sharing data was potentially just too dangerous.
It does mean that some things that work on arrays will not work with cubes.  For example, code like "subarea = cube[i:j, 2]; subarea[zero_indices] = 0".

Niall Robinson

unread,
Mar 27, 2013, 10:59:45 AM3/27/13
to scitoo...@googlegroups.com
Ah - I thought we'd had that conversation before but I couldn't find it on the boards! That actually sounds pretty logical. It is a feature I've found myself having to work around a few times lately though. Playing devils advocate for a second...

I guess the problem is that the metadata doesn't come with the data as it stands right? I understand its a problem that, if someone takes a "view" and then changes the metadata, it causes strife. But currently if I take a view of the data attribute, and I want to know about the associated metadata (e.g for comparison to another cube's data, as in the above example), then I have to mess around with indexing all the coords. If you can trust people not to alter the data shape, then perhaps you can trust people to not break the metadata? I understand that perhaps the answer to that last bit might be "no"...

However, with regards to the question about repeating coords: would it be desirable for a cube with a time dimension (which has a coord_classification of months) to be compatible with a similar cube that has no time dimension, but a similar coord_classification (albeit of different length). i.e. as in the example in the original post. I hoped I would be able to just do
data - clim
as I wrote above. I appreciate that this kind of behaviour might become complicated. for instance
data + clim
is probably pretty easily defined but what about
clim - data
? This kind of thing is something that we would be doing very often, so it would be a shame to have to hack around with a messy loop/list of indices if there was potentially a more graceful way of implementing it.

Thanks as always AVD

Niall

Paul Halloran

unread,
Jan 19, 2014, 8:45:05 AM1/19/14
to scitoo...@googlegroups.com
Hi,

In this thread it suggests (at least to me) that you can assign data to part of the data part of a cube - but I've never managed to do this, and I've spent a ridiculous amount of time trying to do so.

If I (for example) try cube1[0].data = cube2[0].data, where both cubes have the same coordinates, this will not work. Is there a trick I am missing?

Thanks!
Paul

Paul Halloran

unread,
Jan 19, 2014, 8:45:36 AM1/19/14
to scitoo...@googlegroups.com
Sorry - same dimensions rather than coordinates.

Andrew Dawson

unread,
Jan 19, 2014, 3:19:21 PM1/19/14
to scitoo...@googlegroups.com
In short, you currently cannot do what you are trying to do, at least not the way you are currently doing it. The reason is when you do cube1[0].data this slices the cube which currently results in a copy rather than a reference to the original cube. If you instead tried:

cube1.data[0] = cube2.data[0]

or

cube1.data[0] = cube2[0].data

Then it would work, since in these cases you are not slicing the cube on the left hand side but the data array which it contains. The data array is a numpy array, and numpy arrays use references when slicing rather than copies. Let me know if that doesn't make sense!

For what it's worth this behaviour in iris is the subject of a current discussion on a Github pull request https://github.com/SciTools/iris/pull/939 which implements consistent slicing behaviour for iris cubes (consistent with numpy), feel free to join the discussion.

Hope that helps.
Reply all
Reply to author
Forward
0 new messages