Hi all,
Julien, sorry for not posting about the new thread in your original one! And sorry for the delay in this message overall -- was on travel this past week.
Ryan, I fully agree with your comment about leveraging the unique capabilities of xarray in order to not just replicate what Iris and UV-CDAT already do. And it seems that the dask/out-of-core functionality is your candidate. This seems reasonable to me too, although admittedly I have only cursory experience with dask (the cluster at GFDL has, depending on the node, up to 512 GB RAM :)).
Would this mean that we would be attaching to dask from the outset? I ask because xarray has deliberately kept it as an optional dependency, and it seems to be not fully mature (again, my knowledge is limited). I'm not necessarily opposed to this, just wanted to clarify.
Looking at the three "scalable, out-of-core..." services Ryan lists, they seem largely independent of one another. So is what you're after an interface/data structure that facilitates performing each of them on, say, netCDF data (or xarray.Datasets more generally)?
Regarding units, which Ryan, Joe, and Julien all brought up: both
astropy and
Iris have units support. I'm not familiar with the internals for either case, but surely they can be leveraged. Would anybody experienced with either care to chime in? And is this orthogonal to the above dask-related issues?
My two cents for now. Thanks!
Best,
Spencer
Best,
Spencer