Copying a Dataset and then renaming a variable in order to use the original layout for the output?

571 views
Skip to first unread message

James Adams

unread,
Jan 7, 2016, 7:58:32 PM1/7/16
to xray
I would like to create a Dataset by copying an existing Dataset that I've read from NetCDF, rename the original variable, modify the values, and then write it back out to NetCDF using a different name. How would I go about this?

Since this is my first foray into using xray I may be (probably am) going about things in the wrong way, if so please set me straight. Here's my algorithm:

with xray.open_dataset(precip_file) as precip_dataset:

    # make copies of the original, input dataset to use as output datasets for each scale
    for scale in enumerate([1, 3, 6])
        
        # make a copy of the input dataset to use as output
        out_dataset = copy.deepcopy(precip_dataset)
        
        # FIXME how to do this?
        # rename out_dataset.precip_var to out_dataset.var_scale

        # also I would like to update the attributes of the output Dataset in order
        # to properly describe the data since it's no longer precipitation data
        
        # loop over all lon/lat points
        # (perhaps this looping can be replaced by groupby() and/or apply() calls)
        for lon in range(precip_dataset.lon.size):
            for lat in range(precip_dataset.lat.size): 

                # perform a computation on a lon/lat slice from the input dataset, returning an equivalent 
                # sized slice which is assigned to the corresponding slice of the output dataset
                out_dataset.var_scale.data[:, lon, lat] = my_function(precip_dataset.data[:, lon, lat])

        # write the dataset for this scale as NetCDF
        out_dataset.to_netcdf(base_filename + str(scale) + '.nc')
       
Thanks in advance for your help.

James Adams

unread,
Jan 8, 2016, 3:58:08 PM1/8/16
to xray
This is working well for me so far (much easier than using the netCDF4 module!), with two exceptions:

1) I'm unable to rename the variable of the deep copied Dataset using the rename() function. I call the rename() function and no error is raised, however the resulting variable in the output NetCDF still has the original variable's name. Also when I look at the variable in a debugger after the function has been called the old name is still shown. For example the original variable name is 'prcp', and I attempt to rename the variable in the copy (output_dataset) like so:

    output_dataset.rename({'prcp': variable_name})

No error is raised. However once I open the output NetCDF the variable is still named 'prcp'. So the rename() seems to have silently failed.


2) I can't update the '_FillValue' attribute of the variable, and when I try to do so I get a hard error which crashes my program. If I update the attributes without including '_FillValue' then all is well. Below is the stack trace:

ERROR:__main__:Failed to complete
Traceback (most recent call last):
  File "C:\home\eclipse_workspaces\default\climate_indicators\src\monthly_nclimgrid.py", line 75, in <module>
    dataset.to_netcdf(output_file)
  File "C:\Anaconda\lib\site-packages\xray\core\dataset.py", line 881, in to_netcdf
    engine=engine, encoding=encoding)
  File "C:\Anaconda\lib\site-packages\xray\backends\api.py", line 352, in to_netcdf
    dataset.dump_to_store(store, sync=sync, encoding=encoding)
  File "C:\Anaconda\lib\site-packages\xray\core\dataset.py", line 827, in dump_to_store
    store.store(variables, attrs, check_encoding)
  File "C:\Anaconda\lib\site-packages\xray\backends\common.py", line 226, in store
    cf_variables, cf_attrs = cf_encoder(variables, attributes)
  File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 1022, in cf_encoder
    for k, v in iteritems(variables))
  File "C:\Anaconda\lib\collections.py", line 57, in __init__
    self.__update(*args, **kwds)
  File "C:\Anaconda\lib\_abcoll.py", line 571, in update
    for key, value in other:
  File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 1022, in <genexpr>
    for k, v in iteritems(variables))
  File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 680, in encode_cf_variable
    var, needs_copy = maybe_encode_fill_value(var, needs_copy)
  File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 580, in maybe_encode_fill_value
    fill_value = pop_to(encoding, attrs, '_FillValue')
  File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 534, in pop_to
    safe_setitem(dest, key, value)
  File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 522, in safe_setitem
    raise ValueError('Failed hard to prevent overwriting key %r' % key)
ValueError: Failed hard to prevent overwriting key '_FillValue'


--James

Stephan Hoyer

unread,
Jan 8, 2016, 4:17:14 PM1/8/16
to xray
Hi James,

My answers are inline below:

On Fri, Jan 8, 2016 at 12:58 PM, James Adams <mono...@gmail.com> wrote:
This is working well for me so far (much easier than using the netCDF4 module!), with two exceptions:

1) I'm unable to rename the variable of the deep copied Dataset using the rename() function. I call the rename() function and no error is raised, however the resulting variable in the output NetCDF still has the original variable's name. Also when I look at the variable in a debugger after the function has been called the old name is still shown. For example the original variable name is 'prcp', and I attempt to rename the variable in the copy (output_dataset) like so:

    output_dataset.rename({'prcp': variable_name})

No error is raised. However once I open the output NetCDF the variable is still named 'prcp'. So the rename() seems to have silently failed.

xray encourages create *new* dataset objects, rather than modifying them in place. So rename returns a new xray.Dataset, rather than modifying the original. 

To quote the docs again:
"With xray, there is no performance penalty for creating new datasets, even if variables are lazily loaded from a file on disk. Creating new objects instead of mutating existing objects often results in easier to understand code, so we encourage using this approach."

Likewise, modifying an xray dataset never changes the netCDF file from which it was loaded. You'll need to save any changes explicitly to a new netCDF file on disk. 

2) I can't update the '_FillValue' attribute of the variable, and when I try to do so I get a hard error which crashes my program. If I update the attributes without including '_FillValue' then all is well.

Try setting "_FillValue" as the encoding instead:

Xray using _FillValue to control how it writes data back to disk, so you have to be a little more careful to make this work properly. In this case, there is also a "_FillValue" encoding saved on the variable, and xray errors to avoid overwriting the attribute you added with the saved encoding.

 On Thursday, January 7, 2016 at 7:58:32 PM UTC-5, James Adams wrote:

        # loop over all lon/lat points
        # (perhaps this looping can be replaced by groupby() and/or apply() calls)
        for lon in range(precip_dataset.lon.size):
            for lat in range(precip_dataset.lat.size): 

                # perform a computation on a lon/lat slice from the input dataset, returning an equivalent 
                # sized slice which is assigned to the corresponding slice of the output dataset
                out_dataset.var_scale.data[:, lon, lat] = my_function(precip_dataset.data[:, lon, lat])


Yes, in principle you should be able to use something like ds.groupby(['lon', 'lat']).apply(my_function) to do this. Unfortunately, we don't have multi-dimensional grouped operations working yet. See here for some discussion:

For now, I think your approach is the right one, though you could also write something like ds.groupby('lon').apply(lambda x: x.groupby('lat').apply(my_function)) -- which is quite messy.

Hope that helps!

Cheers,
Stephan
Reply all
Reply to author
Forward
0 new messages