Determine scale_factor/add_offset before to_netcdf with dask arrays

76 views
Skip to first unread message

David Hoese

unread,
Oct 20, 2020, 6:36:44 PM10/20/20
to xarray
Hi,

I'm working on a project where I'd like to take an xarray Dataset and save subsets of it to multiple NetCDF files. In the best case I already know a suitable scale_factor/add_offset to put in each variable's encoding or I know a valid_min/valid_max (or valid_range) and can calculate the factor and offset. However, sometimes I don't know any of this and to be able to scale the data to a different data type (ex. float64 -> uint16) I need to load it and do a .min()/.max(). The issue is that my Dataset's variables are backed by dask arrays so doing this min/max would require loading the data for the min/max and then loading it again for the to_netcdf/save_mfdataset.

I'm wondering if anyone has any ideas or if something already exists or I'm completely missing something on how to do this in the most optimal way. I could use `.persist()` but one of the reasons I'm doing this subset saving is because the data are usually relatively large so I'd like to avoid loading everything into memory and keeping it there. Another thing I was thinking is an addition of something in xarray like `.encoding['scale_factor'] = 'auto'` to do this calculation and attribute assignment when the data is going to be saved.

Any ideas?

Dave
Reply all
Reply to author
Forward
0 new messages