Plans for a N-dimensional convolution operator for xarray DataArray ?

378 views
Skip to first unread message

julien....@gmail.com

unread,
May 9, 2016, 3:48:05 AM5/9/16
to xarray

Hi all,

I am starting to think about the best approach for implementing spatial filtering methods to xarray DataArray with dask.

I was wondering whether one of you had plans for implementing a N-dimensional convolution operator for xarray.
I was thinking of something close to scipy.signal.convolve or one of its equivalent.

I guess a straightforward approach is to use a numpy implementation and map_block/map_overlap methods, right ? 

thanks for your comments/feedbacks.

cheers.

--
J.

Ryan Abernathey

unread,
May 9, 2016, 9:27:00 AM5/9/16
to xar...@googlegroups.com
Julien,

The big question is whether you want to convolve __across_ dask blocks or __within__ dask blocks.

If you are convolving within the blocks (e.g. you want to do smoothing in space and your dataset is chunked in time), it is easy: you use dask.map_blocks on the underlying xarray.Dataset.data objects. I have an example notebook for this:
I use the astropy convolution operators because they deal with missing data better. (Note: this application motivated a pull request to astropy to release the GIL for better parallel performance. This was merged in version 1.0.4.)
The one downside of this approach is that you have to work with the low-level dask objects. There has been talk about providing a higher-level dask wrapper for this functionality, but it has not been implemented: https://github.com/pydata/xarray/issues/585

If you want to convolve across dask blocks (e.g. data is chunked in all dimensions), that would have to be implemented in dask. It doesn't sound easy to me, but maybe Matt or Stephen could say otherwise.

-Ryan

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/8a9031e4-a9d1-4cee-ad61-c722606c3f74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

julien....@gmail.com

unread,
May 9, 2016, 9:58:14 AM5/9/16
to xarray

thanks Ryan for your reply and the references.

I have actually made some tests for convolution across dask blocks

(Your notebook  http://nbviewer.jupyter.org/gist/rabernat/3e2c3645d4666674cf2b was actually very useful)

The preliminary results are available here :
https://github.com/lesommer/notebooks/blob/master/OptimizingPyClimLanczosFilter.ipynb

it seems that using map_block with a depth specified from the width of the averaging window works ok.

But this is done at the dask level and I was wondering how to do the same with xarray dask.

Matthew Rocklin

unread,
May 9, 2016, 10:24:44 AM5/9/16
to xarray
Right, relevant dask.array documentation for mapping with a slight block overlap is here:

Stephan Hoyer

unread,
May 9, 2016, 12:16:39 PM5/9/16
to xar...@googlegroups.com
Wrapping dask array functions for xarray is usually pretty straightforward, e.g.,

data = orig_data_array.chunk().data
new_data = da.map_blocks(data, ...)
new_data_array = xr.DataArray(new_data, orig_data_array.coords)

If necessary, you can get axis numbers by calling the get_axis_num method.

julien....@gmail.com

unread,
May 9, 2016, 6:39:12 PM5/9/16
to xarray

Thank you all for your answers. I will proceed with this approach.
--
J.
Reply all
Reply to author
Forward
0 new messages