Management of CF NetCDF global attributes within Iris ...

277 views
Skip to first unread message

bjlittle

unread,
Jun 27, 2013, 7:58:18 PM6/27/13
to scitoo...@googlegroups.com
Hi,

Okay, so I'd like to start a discussion regarding what Iris should do with respect to handling NetCDF global attributes; naturally, this conversation is all within the context of the CF Metadata Conventions, see http://cf-pcmdi.llnl.gov/.

"Simple rule of engagement ..."


I'm really keen not to discuss implementation detail here. Let's try keeping it to what not how.

"Other than caffeine and cake, what do I want? ..."

The goal (hopefully) is to reach a group consensus at the data model (hand-wavy-conceptual) level. This might involve some interaction with the CF community ... I don't know, let's just see how it unfolds ...

"So, what's my beef, dude? ..."

The basic problem is that Iris doesn't do a very good job at dealing with NetCDF global attributes.

For example, at the moment when Iris creates a cube from NetCDF, the attributes from the NetCDF data variable associated with the cube and the NetCDF global attributes of the file are all stored together within the attribute dictionary of a cube. Due to this lack of separation, the notion of global and local (i.e. data variable) attribute identity is lost. In addition to this, global attribute metadata can be thrown away, as a local (i.e. data variable) attribute takes precedence over a similarly named global attribute.

As a result, when a cube loaded from NetCDF is saved back to NetCDF, all the original input NetCDF global attributes are now associated with the output NetCDF data variable (assuming that they've not already been clobbered by similarly named local attributes).

"In my opinion ..."

Local and the global attributes require to persist within separate name-spaces ... and I'll even go as far as saying that these name-spaces should both live within a cube.

"The question, dear Watson is ..."
  1. Primarily, do you agree with this proposal?
  2. What should Iris do (or not) when saving two cubes to NetCDF with different and conflicting global attributes?
  3. Are the global attributes associated with a cube even valid outside the scope of the NetCDF file?
  4. Are the global attributes associated with a cube still valid when the contents of a cube changes from the result of applying an operation i.e. mean the data.
  5. Can two cubes with different global attributes merge? Can they concatenate?
"And discuss ..."

Operating within the scope of the CF Conventions, what is the best approach for Iris to adopt here?

All points of view, use case scenarios, and opinions welcomed!

LeonH

unread,
Jul 2, 2013, 12:09:59 PM7/2/13
to scitoo...@googlegroups.com
Well, as no-one else has ventured. I will put up a straw-man for people to shoot down. In answer to the numbered questions, this what I expect to happen:

1. I agree, though the global namelist could live in the CubeList. If the user has chosen load_cube() then global attributes are lost.
2. Two cubes with conflicting attributes should not be put in the same netCDF file. How could they?
3. Yes, they might still be interesting/useful
4. Yes, most should still be valid
5. No they can't (with the exception of dates in the history attributes, see separate discussion)

Putting the global attributes in the CubeList solves a few of the above problems, but causes others. If Iris continues being strict and not merging because of global attributes differing on cubes, then it would be really useful to have the option a load time to not load global attributes!! A lot of my callback functions are lines of deleting attributes so the cube will merge.

Leon

bjlittle

unread,
Jul 3, 2013, 4:17:44 AM7/3/13
to scitoo...@googlegroups.com
Hi Leon!

Thanks for contributing and kick-starting the discussion ...

I may have made the fatal mistake of asking too many questions at once, so let's just focus on the first primary question.

It's an interesting concept to associate the global attributes name-space with a CubeList. This is more akin to the current relationship within the actual NetCDF file i.e. one set of global attributes to many data variables.

So given such a scenario, would you expect one CubeList to extend another? i.e. can the cubes from one CubeList be added to another CubeList with different global attributes? (apologies for playing devil's advocate here) ... what should happen here?

Also, could you explain your reasoning behind clearing global attributes when using iris.load_cube() ?

Many thanks
-Bill




marqh

unread,
Jul 4, 2013, 3:42:53 AM7/4/13
to scitoo...@googlegroups.com
I've started investigating this question with the CF community.  An interesting point of view is provided by Bryan:

https://cf-pcmdi.llnl.gov/trac/ticket/95#comment:64

LeonH

unread,
Jul 4, 2013, 7:00:55 AM7/4/13
to scitoo...@googlegroups.com
Is marqh's link safe to access? It is apparently  a self-signed certificate that expired 3 years ago and is not really valid for that site anyway...

Leon

marqh

unread,
Jul 4, 2013, 8:03:01 AM7/4/13
to scitoo...@googlegroups.com
Hi Leon

I have taken the view that this is only a Trac site with local log in control and any sort of certificate is better than an http connections so I have just accepted the cert.  You have to make your own decision on this question, but it is where the CF community maintain their change discussions so I feel it's essential viewing for me, whatever the certificate status.

As to whether it should be better, I'm afraid that that is really a question for the CF community who maintain the site
They have a mailing list which you could ask such questions on, linked from:
http://cf-pcmdi.llnl.gov/
but I haven't chosen to post a question on this topic to them, I don't have the energy.

sorry if this answer isn't as helpful as you hoped
m

Carwyn Pelley

unread,
Jul 5, 2013, 5:46:26 AM7/5/13
to scitoo...@googlegroups.com
I also think thats an amaizinlgy original idea for associating with the cubelist and not considering them with the cube itself.  However, I have a great sense of caution to this approach as I am worried that our data interoperability may be in danger.
In my opinion, we have a conflicting statement.  It cannot be said that conflicting attributes can be disregarded when writing to a single netCDF file when at the same time the opinion is that 'Yes, most should still be valid'.
Sorry @LeonH if I am taking you out of context (feel free to clarify).

The word 'most' is the issue here, the last thing we want to write incorrect information to the resulting file and/or cause unnecessary conflicts..  The only feasible approach it to make no assumptions.  If the cube has changed, you cannot keep the global attributes, without explicit intervention of the user.
(You can never make any assumption that this information is still valid.  This is the very reason why the standard name and unit is lost in such a situation!  It is up to the user to specify that the standard name is xxx etc.)


The only truly conflicting issue here for me here is what to do when saving cubes with conflicting global attributes.  We cannot answer this without making an assumption, unless we were not to allow saving it due to this conflict.
I'm hesitant to this as a solution, as a number of people may have something running for a number of hours/days, save to a file at the end and......  fall over unexpectedly just because a global attribute conflict (which many dont care about anyway).

---------------
I propose the following approach (forget the variable names as it doesnt matter):
Have two containers on the cube, global attribute in and global attribute out.

global_attribute_in: has the global attributes from the file the cube came from (not used for merging and not used for saving).
global_attribute_out: initially empty, requires the user to specifically populate it, (using the contents of the global_attribute_in dictionary if they wish, but importantly through their choosing).

I allows us the benefits of retaining information, while at the same time not taking any assumptions.

LeonH

unread,
Jul 5, 2013, 10:11:19 AM7/5/13
to scitoo...@googlegroups.com
For those loathe to follow marqh's link, this is what it says:

The Data Model needs to differentiate between the encoding of attributes and attributes of model constructs. The encoding states how a field construct's attributes can be encoded and decoded into global and/or variable attributes. These rules would include how variable attributes override global attributes.

Then the rest of the data model can ignore the distinction between global and variable attributes. Everything can be expressed in attributes of constructs.

I hope that is helpful for some people here, because it may only be comprehensible to someone in the field, me excluded.

 I like Carwyn's idea about having containers, but we have to strike a balance between keeping every single attribute and avoiding unnecessarily complexity. In the end if the cube structure is too complicated and that structure is too rigidly adhered to, the user will just shut off. Most of the time I spend debugging with iris is spent getting CubeLists to merge. Sometimes I get so frustrated that I just delete all attributes and all cell_methods to get it all to merge. Surely this is not a situation we want to happen? Good meta data is being lost because rubbish meta data is slavishly being followed.

Leon

Leon

bblay

unread,
Jul 10, 2013, 6:00:46 AM7/10/13
to scitoo...@googlegroups.com

Hi,

There is no beef.

Global attributes are supposed to be overridden when loading from NetCDF.
They are not part of the CF data model so they don't need to be in Iris.
Cube attributes suffice, do they not?

Sometimes I get so frustrated that I just delete all attributes and all cell_methods to get it all to merge. Surely this is not a situation we want to happen?
Quite right, and I'm sorry to hear it can be frustrating, Leon.
Please would you be able to package up an example?
I'm really keen to look at this, and I'm sure other developers are too.

LeonH

unread,
Jul 17, 2013, 8:36:28 AM7/17/13
to scitoo...@googlegroups.com
I'm happy to give examples of merging issues, there is one on the met-office.comp.avd.general newsgroup right now that I can post here if it seems interesting and I will post another here as well (though on a different topic as I don't want to derail this interesting discussion on global meta data)
Leon

Reply all
Reply to author
Forward
0 new messages