Luke Abraham
unread,Jan 27, 2022, 8:34:29 AM1/27/22Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to SciTools (iris, cartopy, cf_units, etc.) - https://github.com/scitools
Hello,
Are there any ways to reduce the read time of pp files? The eventual aim of this work is to do some post-processing on files during a UM simulation as part of the postproc task on the batch system, so we want to be able to read files in as quickly as possible.
Taking one of our typical daily pp files there are 97 separate fields with 24 time values at each hour. There are a number of different vertical levels, both pressure-based and model levels.
I've been looking at the speed of reading this pp file using Iris and comparing this to cf-python. I've tried 2 different tests, reading in the whole file and also selecting a single field using the STASH code. I'm using Iris 3.1.0 and cf-python 3.12.0, both installed using conda. Both Iris and cf do lazy loading of data I think, and in cf the work is done in C rather than python I believe.
When reading in the whole file:
- `iris.load` completed in 446.3398 s
- `cf.read` completed in 10.0120 s
When selecting a single field from the file:
- `iris.load` using `iris.AttributeConstraint(STASH='...')` completed in 46.7967 s
- `cf.read` using `select='stash_code=...' completed in 0.1097 s
Obviously there is a very big difference here. Are there any tricks that I can play to get Iris any faster? If not, is there a function to convert a cf datastructure into an Iris cube?
Many thanks and best wishes,
Luke