Copying a Dataset and then renaming a variable in order to use the original layout for the output?

James Adams

unread,

Jan 7, 2016, 7:58:32 PM1/7/16

to xray

I would like to create a Dataset by copying an existing Dataset that I've read from NetCDF, rename the original variable, modify the values, and then write it back out to NetCDF using a different name. How would I go about this?

Since this is my first foray into using xray I may be (probably am) going about things in the wrong way, if so please set me straight. Here's my algorithm:

with xray.open_dataset(precip_file) as precip_dataset:

# make copies of the original, input dataset to use as output datasets for each scale

for scale in enumerate([1, 3, 6])

# make a copy of the input dataset to use as output

out_dataset = copy.deepcopy(precip_dataset)

# FIXME how to do this?

# rename out_dataset.precip_var to out_dataset.var_scale

# also I would like to update the attributes of the output Dataset in order

# to properly describe the data since it's no longer precipitation data

# loop over all lon/lat points

# (perhaps this looping can be replaced by groupby() and/or apply() calls)

for lon in range(precip_dataset.lon.size):

for lat in range(precip_dataset.lat.size):

# perform a computation on a lon/lat slice from the input dataset, returning an equivalent

# sized slice which is assigned to the corresponding slice of the output dataset

out_dataset.var_scale.data[:, lon, lat] = my_function(precip_dataset.data[:, lon, lat])

# write the dataset for this scale as NetCDF

out_dataset.to_netcdf(base_filename + str(scale) + '.nc')

Thanks in advance for your help.

James Adams

unread,

Jan 8, 2016, 3:58:08 PM1/8/16

to xray

This is working well for me so far (much easier than using the netCDF4 module!), with two exceptions:

1) I'm unable to rename the variable of the deep copied Dataset using the rename() function. I call the rename() function and no error is raised, however the resulting variable in the output NetCDF still has the original variable's name. Also when I look at the variable in a debugger after the function has been called the old name is still shown. For example the original variable name is 'prcp', and I attempt to rename the variable in the copy (output_dataset) like so:

output_dataset.rename({'prcp': variable_name})

No error is raised. However once I open the output NetCDF the variable is still named 'prcp'. So the rename() seems to have silently failed.

2) I can't update the '_FillValue' attribute of the variable, and when I try to do so I get a hard error which crashes my program. If I update the attributes without including '_FillValue' then all is well. Below is the stack trace:

ERROR:__main__:Failed to complete

Traceback (most recent call last):

File "C:\home\eclipse_workspaces\default\climate_indicators\src\monthly_nclimgrid.py", line 75, in <module>

dataset.to_netcdf(output_file)

File "C:\Anaconda\lib\site-packages\xray\core\dataset.py", line 881, in to_netcdf

engine=engine, encoding=encoding)

File "C:\Anaconda\lib\site-packages\xray\backends\api.py", line 352, in to_netcdf

dataset.dump_to_store(store, sync=sync, encoding=encoding)

File "C:\Anaconda\lib\site-packages\xray\core\dataset.py", line 827, in dump_to_store

store.store(variables, attrs, check_encoding)

File "C:\Anaconda\lib\site-packages\xray\backends\common.py", line 226, in store

cf_variables, cf_attrs = cf_encoder(variables, attributes)

File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 1022, in cf_encoder

for k, v in iteritems(variables))

File "C:\Anaconda\lib\collections.py", line 57, in __init__

self.__update(*args, **kwds)

File "C:\Anaconda\lib\_abcoll.py", line 571, in update

for key, value in other:

File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 1022, in <genexpr>

for k, v in iteritems(variables))

File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 680, in encode_cf_variable

var, needs_copy = maybe_encode_fill_value(var, needs_copy)

File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 580, in maybe_encode_fill_value

fill_value = pop_to(encoding, attrs, '_FillValue')

File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 534, in pop_to

safe_setitem(dest, key, value)

File "C:\Anaconda\lib\site-packages\xray\conventions.py", line 522, in safe_setitem

raise ValueError('Failed hard to prevent overwriting key %r' % key)

ValueError: Failed hard to prevent overwriting key '_FillValue'

--James

Stephan Hoyer

unread,

Jan 8, 2016, 4:17:14 PM1/8/16

to xray

Hi James,

My answers are inline below:

On Fri, Jan 8, 2016 at 12:58 PM, James Adams <mono...@gmail.com> wrote:

This is working well for me so far (much easier than using the netCDF4 module!), with two exceptions:

1) I'm unable to rename the variable of the deep copied Dataset using the rename() function. I call the rename() function and no error is raised, however the resulting variable in the output NetCDF still has the original variable's name. Also when I look at the variable in a debugger after the function has been called the old name is still shown. For example the original variable name is 'prcp', and I attempt to rename the variable in the copy (output_dataset) like so:

output_dataset.rename({'prcp': variable_name})

No error is raised. However once I open the output NetCDF the variable is still named 'prcp'. So the rename() seems to have silently failed.

xray encourages create *new* dataset objects, rather than modifying them in place. So rename returns a new xray.Dataset, rather than modifying the original.

To quote the docs again:

"With xray, there is no performance penalty for creating new datasets, even if variables are lazily loaded from a file on disk. Creating new objects instead of mutating existing objects often results in easier to understand code, so we encourage using this approach."

http://xray.readthedocs.org/en/stable/data-structures.html#transforming-datasets

Likewise, modifying an xray dataset never changes the netCDF file from which it was loaded. You'll need to save any changes explicitly to a new netCDF file on disk.

2) I can't update the '_FillValue' attribute of the variable, and when I try to do so I get a hard error which crashes my program. If I update the attributes without including '_FillValue' then all is well.

Try setting "_FillValue" as the encoding instead:

http://xray.readthedocs.org/en/stable/io.html#writing-encoded-data

Xray using _FillValue to control how it writes data back to disk, so you have to be a little more careful to make this work properly. In this case, there is also a "_FillValue" encoding saved on the variable, and xray errors to avoid overwriting the attribute you added with the saved encoding.

On Thursday, January 7, 2016 at 7:58:32 PM UTC-5, James Adams wrote:

# loop over all lon/lat points
# (perhaps this looping can be replaced by groupby() and/or apply() calls)
for lon in range(precip_dataset.lon.size):
for lat in range(precip_dataset.lat.size):

# perform a computation on a lon/lat slice from the input dataset, returning an equivalent
# sized slice which is assigned to the corresponding slice of the output dataset
out_dataset.var_scale.data[:, lon, lat] = my_function(precip_dataset.data[:, lon, lat])

Yes, in principle you should be able to use something like ds.groupby(['lon', 'lat']).apply(my_function) to do this. Unfortunately, we don't have multi-dimensional grouped operations working yet. See here for some discussion:

https://github.com/pydata/xarray/issues/324

For now, I think your approach is the right one, though you could also write something like ds.groupby('lon').apply(lambda x: x.groupby('lat').apply(my_function)) -- which is quite messy.

Hope that helps!

Cheers,

Stephan

Reply all

Reply to author

Forward