What is the most effective way to manage memory when applying operations to a CubeList?

119 views
Skip to first unread message

Patrick Breach

unread,
Nov 24, 2014, 11:59:02 AM11/24/14
to scitoo...@googlegroups.com

I know the question is slightly vague, but I think this issue will come up in many different ways so I want to be as general as possible.
I have a cube list loaded by:

cubes = iris.load('netCDF file for data at year*.nc')

There is a file for each year from 1950-2013 giving a total size of ~100GB. All dimension are the same except for time which is an unlimited dimension. I am pretty impressed with the ability to extract data and collapse dimensions for each cube in the list while not loading all cubes, however I am wondering how I can do some other operations iterating through each cube in the list without loading all cubes into memory.
For example, if I wanted to interpolate a set of lat/lon points in each cube and put them into another cubelist how can I do this without loading all cubes? I've tried:

interpolated_cubes = iris.cube.CubeList(cube.interpolate([('latitude',latpoints),('longitude', lonpoints)], iris.analysis.Linear()) for cube in cubes)

but I think this is loading all cubes into memory (I stopped before my computer crashed). What would be the proper way to do this to reduce memory usage? I can handle each cube individually (~1.5GB) but not all.

As a side note, I think it would be really nice to have a section in the User Guide dealing specifically with how to take advantage of biggus in iris for dealing with large datasets. I'm sure a lot of people can relate/have similar questions regarding this area in iris. Also I think a lot of people interested in this functionality and it would be good to highlight it as it becomes more mature in iris.

Phil Elson

unread,
Jan 9, 2015, 4:44:03 AM1/9/15
to Patrick Breach, scitoo...@googlegroups.com
Slightly old question, but hopefully I can still be of help,

Right now, there is no "lazy" interpolation in Iris. This means that if you want to have interpolated cubes, they have to have real data. That isn't such a big problem if your source data is huge but the interpolated data isn't, as you would just loop through each cube, and before realising the data of the original taking a copy so that after the copied cube has loaded its data is doesn't remain resident in the original cubes. In short:

>>> cubes[0].has_lazy_data()
True
>>> # Trigger the data to be loaded in all cubes.
>>> [cube.data for cube in cubes]
>>> cubes[0].has_lazy_data()
False

Whereas taking a copy:

>>> cubes[0].has_lazy_data()
True
>>> # Trigger the data to be loaded in all cubes.
>>> [cube.copy().data for cube in cubes]
>>> cubes[0].has_lazy_data()
True

With this pattern, provided you can fit the resultant cubes in memory, you can process data of arbitrary size.

HTH




--
You received this message because you are subscribed to the Google Groups "Iris" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scitools-iri...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages