Comparing cubes with observations

Duncan Watson-Parris

unread,

Feb 1, 2017, 12:00:11 PM2/1/17

to Iris-dev

I’m interested to see your recent work on resampling unstructured data and wondered what your roadmap for that was?

One of the main use cases for this is presumably in comparing model output with observations, which is something which CIS (http://cistools.net) was designed to make easy. I’ve talked to people at the Met Office about it but you might not be aware of it: It uses Iris cubes for model data and a bespoke type for observational data and provides a number of flexible ways of doing collocations and comparisons.

In fact, I will shortly be working with the UKCA team to move their evaluation suite over to the updated Auto-asses framework, using CIS to bring in satellite and other observation sets. Hopefully this will provide a template for other groups to use CIS + auto-asses (Iris) for evaluating model fields using collocated observational datasets - with the UK-ESM team also interested.

With that in mind it would be great to discuss the ways CIS and Iris could complement each other going forward, primarily with respect to model evaluation, but also more generally.

marqh

unread,

Feb 3, 2017, 9:32:20 AM2/3/17

to Iris-dev

Hello Duncan

thank you for getting in touch

> I’m interested to see your recent work on resampling unstructured data and wondered what your roadmap for that was?

We are developing capabilities to make it easier to resample where the source or target cube does not have a regular or heavily structured set up.

The capability to encode cubes from satellite swaths, point observations, trajectories and the like have been in the Iris code base for a long time, but some of the useful cube processing functions were less than capable of dealing with them in some cases.

We are incrementally building resampling capabilities to address this limitation.

We have spoken quite recently about CIS. Following this communication, I have been trying to work my way through some of these clues and make links into the code bases for CIS tools

I am currently struggling to get my head around the approach that is taken for
'a bespoke type for observational data'

Please could you provide some pointers into your code base that show how this bespoke type is implemented?
Please may you also provide some pointers into your code base illustrating how you conduct comparisons between instances of this type and well structured model data sets?

Perhaps you could also share a thought or two on the key features that this type provides that Iris cubes do not?

all the best
mark

Duncan Watson-Parris

unread,

Feb 3, 2017, 11:30:42 AM2/3/17

to Iris-dev

Hi Mark,

Good to hear from you.

The capability to encode cubes from satellite swaths, point observations, trajectories and the like have been in the Iris code base for a long time, but some of the useful cube processing functions were less than capable of dealing with them in some cases.

We are incrementally building resampling capabilities to address this limitation.

That's interesting, I know you can create Cubes with auxilliary dimensions to store e.g. trajectories and swaths but I've found that many of the routines for working with cubes assume or work only over dimension coordinates. I guess this is what you have set out to adress?

I am currently struggling to get my head around the approach that is taken for
'a bespoke type for observational data'

No worries, the architecture isn't clearly documented on the website as it's primarily aimed at users. It's described in detail in the paper (http://www.geosci-model-dev.net/9/3093/2016/) in sections 3 and 4, but basically we cater for two broad types of data: Gridded (structured) and Ungridded (unstructured). For gridded data we use a subclass of Iris cubes (GriddedData), and for ungridded we use our own type (UngriddedData). Both conform to a (CommonData) interface. The data model for UngriddedData is essentially a list of independent points in space and time - the Point data described in Appendix H of the CF-conventions (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/aph.html). This simplicity gives us a lot of flexibility when comparing datasets. We deliberately followed the Iris Cube interface as closely as possible in the implementation so each object has a set of Coords (essentially Aux Coords), data and metadata associated with it. In many respects it's probably very similar to a Cube with one Dim Coord and other Aux coords.

Please could you provide some pointers into your code base that show how this bespoke type is implemented?

See e.g. https://github.com/cedadev/cis/blob/master/cis/data_io/ungridded_data.py#L424 and the docs here: http://cis.readthedocs.io/en/stable/cis_api.html#data-objects

Please may you also provide some pointers into your code base illustrating how you conduct comparisons between instances of this type and well structured model data sets?

See the Collocator classes in https://github.com/cedadev/cis/blob/master/cis/collocation/col_implementations.py. There is one class for resampling each of: ungridded -> ungridded, gridded -> ungridded, gridded -> gridded and ungridded -> gridded data.

One of the key features of CIS is the flexible ungridded-> ungridded resampling; it allows users to do this using either nearest neighbour, or to find the average (or perform any other operation) over the points within a given distance (in any dimension) of the sample points. I will also be spending some time soon optimising this further. Note also that gridded -> gridded is esentially an iris regrid although we also allow resampling in the vertical. The ungridded -> gridded is also something we refer to as aggregation.

Perhaps you could also share a thought or two on the key features that this type provides that Iris cubes do not?

To be honest the type itself doesn't really provide any features that a cube doesn't. It's main benefit is that it allows users to perform cube like operations (plot, extract [what we call subset], resample [collocate], and collapse [aggregate]) in a way that's more natural when working with unstructured data, and allows us to optimise those operations. We felt (5 years ago!) that the different data models should have different classes, I still think it's useful, though I could be convinced otherwise. Originally CIS was designed as a command line tool so some of the name clashes above are the result of trying to come up with sensible names for operations. Now we have a proper Python interface it's on my list to bring them closer to the Iris names.

On a related note I've recently been thinking about extending (probably subclassing) our implementation to explicitly deal with the other data types described in the CF-conventions. Each one presents opportunities for optimising some of the operations we perform.

I hope this is helpful. Once you've had a chance to look at our implementations perhaps we can have a telecon to discuss it further.

Cheers,

Duncan

Duncan Watson-Parris

unread,

Feb 3, 2017, 1:47:27 PM2/3/17

to Iris-dev

Sorry, just to add (to an already lengthy post!) that CIS also provides a mechanism for easily reading a wide range of non-CF compliant files, including ASCII and HDF(4/5) files.

Reply all

Reply to author

Forward