LXML interface library for NetCDF4 Python as NcML

95 views
Skip to first unread message

David Stuebe

unread,
Jul 16, 2013, 4:35:14 PM7/16/13
to netcdf4...@googlegroups.com



Hey NetCDF4 Python folks

I have some working code extending NetCDF4 Python into an LXML interface and I am not sure where it should live or what to call it. I thought I would run it by this group and see if you have any suggestions for some larger context that it might fit into?

Goals:
I am working on a project for IOOS to help sort out there metadata issues. I want to use XPATH queries against netCDF python Dataset objects and then set and get the queried properties. I chose xpath because it also works against ISO 19115 XML and SWE XML etc... 

I have no idea what to call this beast - so I let github name it for me. Better suggestions are welcome.

netcdf_etree provides a parser method which returns an lxml element for the root of the netcdf dataset. It also contains a reference to the actual open dataset so that changes to the xml also change the NC file.

netcdf2ncml provides a couple static strings and some methods to create an NCML representation of a NetCDF file using NetCDF4 python. 

There is also a module which creates a crazy test file...

So I intend to use the parse_nc_dataset_as_etree in another library which does some metadata magic against XML or NetCDF files, but I am not sure what to do with this potentially useful start toward a more complete NCML library for python.

Does anyone have a suggestion about where this code should live or what it should be called?

Thanks

David

Jeffrey Whitaker

unread,
Jul 16, 2013, 6:47:40 PM7/16/13
to David Stuebe, netcdf4...@googlegroups.com
David:  That's great!  I must confess that I don't know much about NcML or what it's used for, so I don't have any concrete suggestions right now.  I've looked at

http://www.unidata.ucar.edu/software/netcdf/ncml/

and it looks like what you have implemented is "NCML at output", which I can see could be quite useful.  What intrigues me even more (for my application domain) is creating netcdf from NcML and "advanced NcML" (particularly aggregation).  These features are currently available in the Java netcdf library, but not in netcdf4-python. 

As far as where your project should live, I would say that for now it makes sense to keep it a separate project, but if it some of the more advanced features are added (like aggregation or creating netcdf files from ncml) it will probably make sense to merge them into netcdf4-python.

Thanks for your contribution.

Regards, Jeff
--
Jeffrey S. Whitaker  
NOAA/OAR/PSD  R/PSD1
325 Broadway, Boulder, CO, 80305-3328  
Phone: (303)497-6313
FAX: (303)497-6449

David Stuebe

unread,
Jul 17, 2013, 9:23:08 AM7/17/13
to Jeffrey Whitaker, netcdf4...@googlegroups.com

Hi Jeff

Thanks for having a look.

What I have created is not NCML for python. It is a new LXML API for netcdf in python using the NCML data model.

I keep hoping NCML aggregation etc. will be implemented in C, rather than at the python level. Do you know of any efforts to do that?

If you wanted to move the code to create NCML from a dataset object into NetCDF4 Python, that certainly doesn't require any new dependencies. You are welcome to insert it as you see fit in the NetCDF4-Python library. I am not sure whether it makes sense to include the LXML API in NetCDF4 Python - it is really a separate thing.

One issue that came up while I was implementing the new interface is that methods for renaming attributes and groups are not present in the NetCDF4 Python interface. It would be great to have the rename method exist for all four objects, not just for dimensions and variables.

David 



Jeffrey Whitaker

unread,
Jul 17, 2013, 4:00:55 PM7/17/13
to netcdf4...@googlegroups.com
David:  Can you give an example use case? That would help me wrap my head around what you're doing.

I went ahead and added a 'renameAttribute' method to Dataset, Group and Variable in svn.  Unfortunately, the C library apparently does not provide a way to rename a Group.

I've been hoping NCML aggregation would be ported from the Java lib to the C lib by Unidata for years now.  I don't think it's going to happen.

-Jeff


On Wed, Jul 17, 2013 at 10:34 AM, Jeffrey Whitaker <whitaker...@gmail.com> wrote:
David:  Can you give an example use case? That would help me wrap my head around what you're doing.

I went ahead and added a 'renameAttribute' method to Dataset, Group and Variable in svn.  Unfortunately, the C library apparently does not provide a way to rename a Group.

I've been hoping NCML aggregation would be ported from the Java lib to the C lib by Unidata for years now.  I don't think it's going to happen.

-Jeff

David Stuebe

unread,
Jul 17, 2013, 4:29:18 PM7/17/13
to netcdf4...@googlegroups.com
Hi Jeff, NetCDF4-Python

Comments inline...

On Wed, Jul 17, 2013 at 4:00 PM, Jeffrey Whitaker <whitaker...@gmail.com> wrote:
David:  Can you give an example use case? That would help me wrap my head around what you're doing.


So here is what we are currently after...

IOOS has a set of metadata concepts such as:

service_provider_name
service_provider_contact_info
west_bounding_longitude

These map to elements in ISO 19115 datasets, to attributes/variables in NetCDF CF as well as other conventions and formats

I have created a library which dynamically creates objects with getters and setters for each of the IOOS concepts.

foo.service_provider_name
foo.service_provider_name = 5

By providing an xpath expression for the underlying convention/format you can operate on any dataset with one API. 

This will make it easy to write software to create and validate metadata for IOOS.

By implementing the LXML interface to NetCDF in python I can now use NetCDF files in this way with XPATH expressions. It seems that XPATH expressions on NCML is the easiest way to specify programmatically the CF attribute convention that should be set for service_provider_name.

The implementation is certainly not the most efficient - but after only a few days it seems to be working.
 
I went ahead and added a 'renameAttribute' method to Dataset, Group and Variable in svn.  Unfortunately, the C library apparently does not provide a way to rename a Group.


Thanks for adding the attribute method. Bummer about groups!?!?!
 
I've been hoping NCML aggregation would be ported from the Java lib to the C lib by Unidata for years now.  I don't think it's going to happen.

Ugh - I don't think it is in scope for us either right now.

David

Jeffrey Whitaker

unread,
Jul 18, 2013, 1:21:15 PM7/18/13
to netcdf4...@googlegroups.com


On Wednesday, July 17, 2013 2:29:18 PM UTC-6, David Stuebe wrote:
Hi Jeff, NetCDF4-Python

Comments inline...

On Wed, Jul 17, 2013 at 4:00 PM, Jeffrey Whitaker <whitaker...@gmail.com> wrote:
David:  Can you give an example use case? That would help me wrap my head around what you're doing.


So here is what we are currently after...

IOOS has a set of metadata concepts such as:

service_provider_name
service_provider_contact_info
west_bounding_longitude

These map to elements in ISO 19115 datasets, to attributes/variables in NetCDF CF as well as other conventions and formats

I have created a library which dynamically creates objects with getters and setters for each of the IOOS concepts.

foo.service_provider_name
foo.service_provider_name = 5

By providing an xpath expression for the underlying convention/format you can operate on any dataset with one API. 

This will make it easy to write software to create and validate metadata for IOOS.

By implementing the LXML interface to NetCDF in python I can now use NetCDF files in this way with XPATH expressions. It seems that XPATH expressions on NCML is the easiest way to specify programmatically the CF attribute convention that should be set for service_provider_name.

The implementation is certainly not the most efficient - but after only a few days it seems to be working.
 
I went ahead and added a 'renameAttribute' method to Dataset, Group and Variable in svn.  Unfortunately, the C library apparently does not provide a way to rename a Group.


Thanks for adding the attribute method. Bummer about groups!?!?!
 
I've been hoping NCML aggregation would be ported from the Java lib to the C lib by Unidata for years now.  I don't think it's going to happen.

Ugh - I don't think it is in scope for us either right now.

David

David:  Thanks - that helps a lot. Regarding renaming groups, I posted a message to the netcdf mailing list and got this response from the lead developer:  -Jeff
I don't see any way in the C library to rename a group.  I see routines 
to rename dimensions, variables and attributes, but not groups.  Am I 
missing something, or is there some fundamental reason why groups cannot 
be renamed?
It's on our list to do, but hasn't been scheduled yet:

  https://bugtracking.unidata.ucar.edu/browse/NCF-204

There's no fundamental obstacle, it just requires adding equivalent
functions for group renaming to all supported APIs, as well as new test
code and documentation.

Lately, fixing bugs and portability problems has taken precedence over
adding new features and functions, but if the ability to rename groups
is important, we'll up its priority.

--Russ
 

David Stuebe

unread,
Jul 18, 2013, 1:25:41 PM7/18/13
to Jeffrey Whitaker, netcdf4...@googlegroups.com

Glad I could explain what I am after. I think it will help the community a good deal if it works.

As for adding group rename - it is more of a completeness issue. I don't think any IOOS datasets actually use groups at this point. So by all means Russ should focus on the bugs and portability over renaming groups, but someday it would be nice when all the bugs are fixed ;-)

David
Reply all
Reply to author
Forward
0 new messages