Xarray for remote sensing data with irregular coordinates?

729 views
Skip to first unread message

Vincent Noel

unread,
Nov 9, 2016, 8:27:06 AM11/9/16
to xarray
hello list,

I have re-stumbled recently upon xarray following its renaming. Like most people I see on this list I work in atmospheric sciences, but I don't work with model output, I analyze remote sensing data, mostly from space. In the few hours I've been using xarray to process such data, I've been very impressed with:
- its ability to open and read data from files in netCDF4, HDF4 and HDF5 with very little handholding (especially HDF4, that's *very* rare in my experience)
- the way it lets you manipulate and process the data along named dimensions and coordinates in a clean, logical way
- the integration of dask that lets me with the same code process a single orbit of a satellite instrument (=1 500Mb file) or a year of measurements (=5400 such files)
- the way it can plot 2D arrays ignoring irrelevant data (with the "robust" keyword) and select an appropriate colormap

All these points are so relevant to my work it's not even funny. However, by delving deeper in the docs, reading the code examples and the API, it seemed to me that an unspoken assumption of xarray was that most coordinates were supposed to be gridded (with the possible exception of time).

Most examples use some kind of model output, from e.g. GCMs, reanalyses or WRF. It is natural to expect these data to be gridded. My use case is slightly different: remote sensing provides a value (radiance, temperature, reflectivity, whatever) measured at a given point in time and space (2D or 3D). But the coordinates are not fixed like in model output. Values at a given time might be provided on a lat-lon grid, and at the next time step on a different grid, not aligned with the first. Both grids can be irregularly spaced, and are never aligned with nice round coordinates. You never find twice the same coordinate. Analyzing this data globally always involves at some point to reproject it on a regular grid, i.e. aggregating measurements in larger bins. Once this is done, everything gets easier and xarray can shine, but achieving this step is currently tricky in xarray, due (I think) to the assumption of regular grids. I have not been able to find xarray code examples oriented that way.

As a test, I have been trying to process a rather simple special case of this and could not figure out how to do it in xarray alone. I have asked for guidance on stackoverflow, but apparently it is not straightforward:

http://stackoverflow.com/questions/40465026/groupby-bins-on-two-variables/40477416?noredirect=1#comment68235389_40477416

Having trouble to do something I find conceptually simple suggests to me I might be trying to fit a square peg in a round hole. Maybe I'm not yet able to think within the framework. Beyond just finding a solution for that particular case, I'd like to know if my use case is something that's on the xarray radar? Are there currently contributors trying to make xarray suited for analyzing remote sensing data, or is the focus explicitly on processing gridded model output? In that last case, is anyone aware of another library as close as possible to xarray, but biased towards remote sensing? xarray is *so close* to provide exactly what I need...

Anyway, sorry for the long message. Thanks!

Cheers,
V.

Fabien

unread,
Nov 9, 2016, 10:14:32 AM11/9/16
to xarray
Hi Vincent,

On 11/09/2016 02:27 PM, Vincent Noel wrote:
Having trouble to do something I find conceptually simple

I personally find that irregularly spaced satellite dataset are conceptually not simple at all. For example, the problem of time-varying coordinates (or even worse: time varying dimensions!) has, afaik, no simple general solution. The model world has so many advantages

Regarding the SO post: as Maximilian says, grouping by more than one dimension is currently being worked on (https://github.com/pydata/xarray/pull/924).

Otherwise I find your question quite interesting, and I'm looking forward to read the answers you'll get.

Fabien

Ryan Abernathey

unread,
Nov 9, 2016, 10:32:43 AM11/9/16
to xar...@googlegroups.com
Vincent (+ Fabien + everyone else),

On one hand, it's hard to care about python today...On the other, this discussion is a most welcome distraction from reality.

Your need for regridding is a general one for the earth sciences. Even people who work only with models often want to regrid their data. The design question is whether such features belong in xarray or in a standalone package. My view is that we need a standalone package.

This past weekend we held a workshop on xarray / dask in Ocean / Atmosphere / Climate science.

One thing we did was develop design documents for some packages that would build on xarray / dask and provide discipline-specific functionality. 
We have a draft design document for a regridding package here:
Your comments on this document would be most valuable.

The tentative name for this effort is "pangeo", and soon we will be reaching out to the community (including via this list) to seek input and collaborators. We have a general mission statement and vision drafted here:

Cheers,
Ryan





--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+unsubscribe@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/50a9e50e-2c68-409e-8117-3f4afbbe2aea%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vincent Noel

unread,
Nov 9, 2016, 11:50:32 AM11/9/16
to xarray
by "conceptually simple" I was referring to the case described in the SO post, which can be done in a few lines of general-purpose numpy (as in the provided example). I agree that trying to come up with a general-purpose approach to this problem is very difficult. I've tried several times and always ended up with code tied up to the particular case study -- even when constraining the problem to a single instrument type :-D

The pangeo project looks very interesting. Nice name! I like the approach "on top of xarray", and FWIW the proposed "remap" methods are *exactly* what I was at first expecting to find somewhere within xarray. At first glance I'm not sure what more is needed in the high-level API (?), the devil will be in the details.

I'm glad I brought some distraction from the other news. I know I needed it. I fear climate science, at NASA and elsewhere, is gonna feel the blow.

cheers,
V.

Robin Wilson

unread,
Nov 9, 2016, 3:33:20 PM11/9/16
to xar...@googlegroups.com
On 9 Nov 2016, at 15:32, Ryan Abernathey <ryan.ab...@gmail.com> wrote:

One thing we did was develop design documents for some packages that would build on xarray / dask and provide discipline-specific functionality. 
We have a draft design document for a regridding package here:
Your comments on this document would be most valuable.

I don't have much to say apart from "This sounds great!" - and I'd definitely be keen to contribute and test it later in the development process.

The tentative name for this effort is "pangeo", and soon we will be reaching out to the community (including via this list) to seek input and collaborators. We have a general mission statement and vision drafted here:

This also sounds wonderful. I've started using xarray a lot in my work, and have gradually been introducing colleagues to the many benefits of using something like xarray. At the moment we do most of our work that requires dealing with irregular co-ordinates and so on in other tools (traditional GIS packages, or Python scripts using GDAL warping functions), but once we've got standardised grids then xarray is wonderful. I've also written some very simple (and probably fairly buggy) utility functions for importing and exporting standard GIS raster formats (eg. GeoTIFF, Erdas Imagine, ENVI and so on) to and from xarray. They're available at https://github.com/robintw/XArrayAndRasterio - they're not well tested or documented, but might be useful for someone (or potentially useful as part of this project).

Cheers,

Robin

Dr Robin Wilson
Research Fellow
University of Southampton, UK

Ryan Abernathey

unread,
Nov 9, 2016, 3:39:01 PM11/9/16
to xar...@googlegroups.com
Robin,

Thanks a lot for you interest. We will send out a broader call for involvement to this mailing list in the near future, and I certainly hope yo will respond.

In the meantime, it sound like you would be very interested in the rasterio backend that is curently being developed within xarray:
Especially given your experience implementing your own rasterio conversion, your comments on this open pull request would be very valuable.

-Ryan

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+unsubscribe@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.

Fabien

unread,
Nov 10, 2016, 8:40:32 PM11/10/16
to xar...@googlegroups.com
Hi Vincent,

On 11/09/2016 02:27 PM, Vincent Noel wrote:
> Having trouble to do something I find conceptually simple

I personally find that irregularly spaced satellite dataset are
conceptually not simple at all. For example, the problem of time-varying
coordinates (or even worse: time varying dimensions!) has, afaik, no
simple general solution. The model world has so many advantages ;-)
Reply all
Reply to author
Forward
0 new messages