I have decided to start a discussion on the topic of performance of Iris (specifically the speed of loading).
This is driven by a number of Iris user requests for optimisation, where its current speed would not yield it to be a suitable replacement.
NetCDF files: (NEMO data)
The script reads in data from 4 model runs and does an area weighted sum of one of the variables. To demonstrate the issue I've reduced it to 5 files for each model run. Normally it could be in excess of 1000 files.
In the code you will see that I've tried doing this firstly reading in the variable I need into a masked array using the netcdf4 module and secondly using iris.load_strict (it makes little difference if I use iris.load or load_strict). I have also added in timings to the code:
- Using the netCDF4 module takes ~0.27 seconds to look over the 4 model runs with 5 files each and produce the timeseries of area weighted sums.
- Using iris takes over 17 seconds to do the same thing (about 60 times as long).
Using the netCDF4 module, only data has been loaded where as in Iris, interpretation (metadata translation) has occurred, creating a datatype agnostic cube which has overwhelming benefits. However, a discussion should take place here as the common uses of Iris and how its speed may or may not stop them from using it in replace of what they are currently using.