Xarray.DataArray and np.ndarray subclasses

79 views
Skip to first unread message

William Sheffler

unread,
Oct 27, 2017, 3:36:05 PM10/27/17
to xarray
I have the impression xarray is meant to be a light wrapper around np.ndarray. Is the intention to also support subclasses of np.ndarray? Is this at least negotiable? Since numpy 1.13, __array_ufunc__ is back and ndarray subclasses could become more prevalent.

I have a biomolecular modeling API built on an np.ndarray subclass that dispatches ufunc behavior based on structured dtype (useful for molecular modeling.. arrays of atoms, dipoles, etc...). In our c++ codebase we have deeply nested for-loops all over the place. In several cases, I've been impressed how directly this nested-for-loop logic translates into very compact tensor operations with ndarray... ndarray with "overloadable" operators via subclass __array_ufunc__ is a real winner for us. 

Before I release this API to a bunch of enthusiastic biochemists, I'd like something more friendly than raw ndarrays. Rather than force biochemists to learn their way around np.einsum, I'd like to have labeled dimensions and smart broadcasting so most common operations "just work." I believe xarray.DataArray is *perfect* for this. The DataSet features are useful for us too, but I'm most excited about smart broadcasting.

Can someone, pretty please, guide me on how to make xarray.DataArray play nice with numpy.ndarray subclasses? General comments also welcome, I am relatively new to the numpy/xarray/pandas world. 

Thanks!

Will Sheffler
Principal Software Engineer
Institute for Protein Design
University of Washington

Stephan Hoyer

unread,
Oct 29, 2017, 4:22:48 PM10/29/17
to xar...@googlegroups.com
Hi Will,

Yes, supporting subclasses of np.ndarary and __array_ufunc__ would certainly be very nice to have for xarray. There are some other use cases (particularly for unit support) for which this would be highly valuable. There's at least one stalled PR working on this, but nothing merged yet.

Please don't hesitate to ask if you have any questions or need pointers around the codebase. This GitHub issue is probably the probably the best place to the start: https://github.com/pydata/xarray/issues/1617

One thing I'll note about structured dtypes is that while I know some people use them with xarray, they are currently not terribly well supported, so you are likely to run into some features where they are don't work properly yet (e.g., for serialization). That said, again we are happy to accept pull requests here: https://github.com/pydata/xarray/issues/1626

Best,
Stephan

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/8cc78c01-36ca-4c1e-bbda-b660a5871fda%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages