I have the impression xarray is meant to be a light wrapper around np.ndarray. Is the intention to also support subclasses of np.ndarray? Is this at least negotiable? Since numpy 1.13, __array_ufunc__ is back and ndarray subclasses could become more prevalent.
I have a biomolecular modeling API built on an np.ndarray subclass that dispatches ufunc behavior based on structured dtype (useful for molecular modeling.. arrays of atoms, dipoles, etc...). In our c++ codebase we have deeply nested for-loops all over the place. In several cases, I've been impressed how directly this nested-for-loop logic translates into very compact tensor operations with ndarray... ndarray with "overloadable" operators via subclass __array_ufunc__ is a real winner for us.
Before I release this API to a bunch of enthusiastic biochemists, I'd like something more friendly than raw ndarrays. Rather than force biochemists to learn their way around np.einsum, I'd like to have labeled dimensions and smart broadcasting so most common operations "just work." I believe xarray.DataArray is *perfect* for this. The DataSet features are useful for us too, but I'm most excited about smart broadcasting.
Can someone, pretty please, guide me on how to make xarray.DataArray play nice with numpy.ndarray subclasses? General comments also welcome, I am relatively new to the numpy/xarray/pandas world.
Thanks!
Will Sheffler
Principal Software Engineer
Institute for Protein Design
University of Washington