Constraining cube loading with regards to time

524 views
Skip to first unread message

Ivan P.

unread,
Jul 10, 2014, 9:10:38 AM7/10/14
to scitoo...@googlegroups.com
Hello,

Here's what I'm trying to do:

I've data from Euro4 hindcast output for the period 1979-2014, which totals at about 400Gb. This data has just 1 phenomenon "surface_net_downward_shortwave_flux" and 3 dimensions(time, lat and lon). Here's what it looks like (for a subset):


surface_net_downward_shortwave_flux / (W m-2) (time: 24; grid_latitude: 1000; grid_longitude: 1100)
     Dimension coordinates:
          time                                     x                  -                     -
          grid_latitude                            -                  x                     -
          grid_longitude                           -                  -                     x
     Auxiliary coordinates:
          forecast_period                          x                  -                     -
          forecast_reference_time                  x                  -                     -
     Attributes:
          STASH: m01s01i201
          source: Data from Met Office Unified Model 8.02

All files have the format YYYYMMDD18_EURO4_s.pp

What I want to do is have the ability to load an arbitrary amount of years and months. For example, loading Jan-Mar for 2000-2005. I want to load only this particular subset of the full span of data when creating my cube, I want to avoid having to
create a really big cube first and then extracting what I need from there, as that would really slow my program down.

What I currently have is:


import iris
import datetime
iris
.FUTURE.cell_datetime_objects = True

path
='/data/local/myname/euro4_retrieved/*.pp'
time_constraint =
lambda c: c>= datetime.datetime(2000, 01, 02, 00, 00, 00) and c <= datetime.datetime(2005, 12, 31, 23, 02, 00)
cube=iris.load(path, iris.Constraint('surface_net_downward_shortwave_flux', iris.Constraint(time = time_constraint)) # this fails


However, the above fails as "time" does not exist as I've not actually created a cube yet. Has anyone else managed to achieve what I'm trying to do?

Thanks!
 

bjlittle

unread,
Jul 10, 2014, 9:56:19 AM7/10/14
to scitoo...@googlegroups.com
Hi Ivan,

I'm assuming you got a traceback along the lines of ...

TypeError: cube_func must be None or callable, got Constraint(coord_values={'time': <function <lambda> at 0x3cf3320>})

Try expressing your constraint as follows ...

cube = iris.load(path, 'surface_net_downward_shortwave_flux' & iris.Constraint(time=time_constraint))

Does this work?

Regards
-Bill

Ivan P.

unread,
Jul 10, 2014, 10:15:32 AM7/10/14
to scitoo...@googlegroups.com
Hi Bill,

Yes, that works! I'm a bit of a IRIS newbie still, trying not to trip up with the syntax but obviously not doing as well as I'd hoped :).

Cheers!

Ivan

bjlittle

unread,
Jul 10, 2014, 10:38:37 AM7/10/14
to scitoo...@googlegroups.com
Excellent!

Good to hear you're off and running again!

Just in case you're unaware, but be sure to checkout the Iris User Guide http://scitools.org.uk/iris/docs/latest/userguide/index.html and there are also some handy resources here https://github.com/SciTools/courses on GitHub.

Cheers,
-Bill
Message has been deleted
Message has been deleted

Ivan P.

unread,
Jul 10, 2014, 12:39:47 PM7/10/14
to scitoo...@googlegroups.com
Hi again Bill,

The above now works but I've realized it doesn't actually do what I want it to do.

What I'm trying to achieve is have a way to load a subset of months for a given subset of years from a range of data. For example, loading just January-March for all years between 2000-2005. I've tried different things to specify my constraint but I've not managed to get my program to do what I want it to do. I'm currently trying to do this using the cube's "time" coordinate data and via the datetime module and Iris' own PartialDateTime but I've had zero success so far.

I'm considering using the os module or something along the lines to filter out what I need based on the filenames instead (all with a format of YYYYMMDD18_EURO4_s.pp) as I've not had success with the prior approach so far, but this doesn't seem as elegant, as it's not immediately transferable to data with differently-formatted filenames.

Do you have any ideas regarding this or perhaps know if Iris supports this functionality natively? I'm aware of the resources you've posted, I've been going through them extensively these past few days!

Cheers!

Ivan

Andrew Dawson

unread,
Jul 11, 2014, 3:23:15 AM7/11/14
to scitoo...@googlegroups.com
Hi Ivan
 
There are possibly a number of ways to get what you want. I've used something like the following which should work for you, there might be other/better ways to achieve the same:
 
from datetime import datetime
from iris.time import PartialDateTime


def time_constraint(cell):
   
with iris.FUTURE.context(cell_datetime_objects=True):
        c1
= datetime(2000, 1, 1) <= cell.point <= datetime(2005, 3, 31)
        c2
= PartialDateTime(month=1) <= cell.point <= PartialDateTime(month=3)
   
return c1 and c2


cube = iris.load(path, 'surface_net_downward_shortwave_flux' & iris.Constraint(time=time_constraint))
 
The time constraint is written as a normal function rather than a lambda, for clarity mostly. The first condition (c1) ensures that the resulting times are within the interval 2001-01-01:2005-03-31, but will be True for all times in between then (i.e. all months of the year). The second condition (c2) makes sure that you only get times where the month is in the range 1-3. Combined they give what you want.
 

bjlittle

unread,
Jul 11, 2014, 6:16:08 AM7/11/14
to scitoo...@googlegroups.com
Nice ...

You could also choose not to use PartialDateTime and do the following:


from datetime import datetime
import iris


def constraint(cell):
   
with iris.FUTURE.context(cell_datetime_objects=True):
        result
= (datetime(2000, 1, 1) <= cell.point <= datetime(2005, 3, 31)) and cell.point.month in [1, 2, 3]
   
return result

cubes
= iris.load(path, 'surface_net_downward_shortwave_flux' & iris.Constraint(time=constraint))

You might also find that just loading your data set (note that, Iris will not actually load all your data into memory as there is a deferred loading mechanism implemented such that the data payload is only loaded when you need the data) then performing an extract might be quicker given that you have a large dataset.

So, something along the lines of ...

cubes = iris.load(path)

result = cubes.extract('surface_net_downward_shortwave_flux' & iris.Constraint(time=constraint))

... might prove more optimal, particularly if your dataset if PP or Fieldsfiles.

HTH

Ivan P.

unread,
Jul 11, 2014, 4:27:20 PM7/11/14
to scitoo...@googlegroups.com
Thanks guys, I realize now that this is actually not that hard to do (must've just been my brain giving up at the end of the workday :) )! 

Bill, I learned about Iris' lazy evaluation so I'll definitely follow your advice. However, correct me if I'm wrong, but Iris'd still have to go through the whole range of files I'm loading in order to get the metadata, which could still take a while if you've many files on load (I've files of daily data for 35 years)? Would you know of a way to get around this? 


bjlittle

unread,
Jul 14, 2014, 7:56:57 AM7/14/14
to scitoo...@googlegroups.com
Hi Ivan,

Okay so you've got 35 years of data which totals about 400Gb ... that's one big dataset!

Iris doesn't support deferred data saving i.e. so even if you load the full dataset under the guise of deferred loading, when attempting to save the dataset Iris will have to realise the full dataset in memory before saving ... and that's just not going to happen (not yet). We will get around to deferred saving at some point, but I don't know where that fits into the development pipeline.

So my personal approach would be to do the following. It may not be ideal, but it may suffice ... others from the community or other core developers may take a different angle of attack!

I'd take the hit on the chin and load the full dataset (that's the slow part) then pickle the resultant cubes, which we'll be able to reap the benefit from with a speedy load later ...

So something along the lines of the following will load your full dataset (not the data, just the meta-data) then pickle the result ...

import cPickle
import os

import iris


# Go make a cuppa coffee or five, read the paper (cover to cover), and perhaps surf the entire internet ...
cubes
= iris.load(path)

# Set the target pickle filename ...
pickle_file
= os.path.join(os.path.dirname(path), 'pickle.pkl')

# Save the cubes as a pickle to file ...
with open(pickle_file, 'wb') as fh:
    cPickle
.dump(cubes, fh, cPickle.HIGHEST_PROTOCOL)

Given that you've pickled your dataset, load it, then perform a constrained extract to get the data sub-set that your interested in for your analysis ...

import cPickle

import iris


def constraint(cell):
   
with iris.FUTURE.context(cell_datetime_objects=True):
        result
= (datetime(2000, 1, 1) <= cell.point <= datetime(2005, 3, 31)) and cell.point.month in [1, 2, 3]
   
return result



# Load the pickle in a heart-beat ...
with open('pickle.pkl', 'rb') as fh:
    cubes
= cPickle.load(fh)

# Now perform the extract ...
cube
= cubes.extract('surface_net_downward_shortwave_flux' & iris.Constraint(time=constraint))

If your dataset is static and doesn't change (and I'm assuming here that's the case since it's so massive) then it's safe to pickle. The benefit is that loading the pickle is fast, in the knowledge that you've had a one hit time penalty for the initial load, reading of meta-data and merging into cubes.

Now you can extract to constrain your dataset as required and perform your analysis.

The downside of this approach is that your kinda tied to a specific release version of Iris. Your pickle file may not be valid for future versions of Iris as the code base will change, possibility invalidating your pickle file of your cube dataset.

HTH

-Bill 

LeonH

unread,
Feb 16, 2015, 5:46:43 AM2/16/15
to scitoo...@googlegroups.com
Is there an error in the user guide? I've had two people come to me now and ask about an example from section 2.2.1 where data is extracted by time for st_swithuns_daterange: http://scitools.org.uk/iris/docs/latest/userguide/loading_iris_cubes.html

In that example the inequality for the constraint is made for "cell", which seems to create a TypeError. If it is replaced with "cell.point" (which is also what is used in the examples on this topic), then it works.

Andrew Dawson

unread,
Feb 16, 2015, 8:59:29 AM2/16/15
to scitoo...@googlegroups.com
Which version of Iris are you using? This works on both versions 1.7 and 1.8/master. The docs for version 1.6 also contain this example, suggesting it should work for that version but I haven't tested it.
Reply all
Reply to author
Forward
0 new messages