Gridding strategy

104 views
Skip to first unread message

Jody Klymak

unread,
Jan 26, 2017, 12:52:42 PM1/26/17
to xarray
Hi all,

Not even sure if xarray is the right tool for this, but it sounds like many of you have thought about this problem...

I'm using the MITgcm and Ryan's very nice xmitgcm package to read "mds" files (basically flat bianry files for each variable).

I have run the model on a regular x/y grid (Cartesian co-ordinates) and now wish to write initialization files from this run onto a stretched horizontal grid with more resolution in the center and less towards the boundaries.  The output should be in the flat binary file format (rather than netcdf, for instance).  

Any advice on the doing the above is welcome.  

My strategy was to read each field in and then inrepolate in x and then in y (its not going to matter too much that there will be some aliasing in the coarse regions) and then write the files.  The input files are 800 Mb and the output 1.6 Gb, so its doable in memory. I was just going to run `np.interp` in y and then in x for each z level.  

Is there a more clever way to do the above?  Perhaps one that would scale up to memory footprints that are much larger?  I apprecaite there is a regridding package being considered, but for now am I missing anything obvious?

Thanks a lot,   Jody

Ryan Abernathey

unread,
Jan 26, 2017, 8:15:07 PM1/26/17
to xar...@googlegroups.com
Jody,

Unfortunately this holy grail regridding package doesn't exist yet.

I have done similar things to what you are describing on out of core datasets. The trick I have used is to work directly on the dask layer.

You can access the underlying dask arrays in a chunked xarray dataset via the dataset.variable.data syntax.
Then you can apply some arbitrary function over the chunks using `map_blocks`.
(xarray `apply` will do something similar)

In your case, if you chunk your data by time and z, you can then apply the interpolation routine to each 2D slice.

Another cool trick I discovered was to use a custom "store" for dask to write my data out to disk in a streaming fashion. Maybe you can modify this routine to work for you:

Cheers,
Ryan

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+unsubscribe@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/1a9c1cbd-7109-4872-93c8-f93418816180%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Klymak Jody

unread,
Jan 27, 2017, 1:46:10 PM1/27/17
to xar...@googlegroups.com
Hi Ryan,

Thanks a lot, this seems to be in the right track.

I am still having a conceptual difficulty.

I can read the input dataset using xmitgcm/xarray w/ no difficulty.

I’m not clear on how to make a new xarray/dask dataArray with the correct new dimensions and a new variable Tnew w/out supplying a numpy array to the data_vars argument.

i.e. I want to make an `out` dataset or dataArray, but I don’t want to supply ‘Tnew’ with a numpy array that is nx,ny,nz in size, I want to generate that new array in chunks from my `input` array (that has different dimensions). (I haven’t chunked `out` yet, but that is the eventual plan).

```
import xmitgcm
import xarray as xr
input = xmitgcm.open_mdsdataset(dirname='./',prefix={'T'},iters=12600,read_grid=True,geometry='cartesian',endian='<',
                               chunks={'Z':1,'time':1})

# This fails… but is conceptually what I want to do:
out=xr.Dataset(data_vars={'Tnew': (['x','y','z'])},
               coords={'x':x,'y':y,'z’:input.Z})

```

Maybe I’m just chasing a unicorn here…

Thanks!   Jody

You received this message because you are subscribed to a topic in the Google Groups "xarray" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xarray/QWl90sCCQyw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to xarray+un...@googlegroups.com.

To post to this group, send email to xar...@googlegroups.com.

Jody Klymak

unread,
Jan 29, 2017, 11:44:55 PM1/29/17
to xarray
OK, Thanks for your help Ryan,  With some prompting from Stephan, I finally understood what `map_blocks` was supposed to do and why it was the solution to my problem.  The following works great (not really sure why one of the blocks passed to `map_blocks` has size (1,1,1,1), but, all seems to work if I skip over that case.)

Cheers,  Jody

```
import xmitgcm
import xarray as xr
data = xmitgcm.open_mdsdataset(dirname='./',prefix={'T'},iters=12600,read_grid=True,geometry='cartesian',endian='<',
                               chunks={'Z':1,'time':1})

def interpolateAtDepth(T,x0,y0,x,y):
    import scipy.interpolate
    if np.shape(T)[-1]>1:
        xout=np.zeros((1,1,ny,nx))   
        fit=scipy.interpolate.RectBivariateSpline(x0,y0,T[0,0,:,:].T)
        xout = fit(x,y).T
    else:
        xout=np.ones((1,1,1,1))
    return xout

# x, y, nx, ny are determined elsewhere, but set the new grid...
tm = data['T'].data.map_blocks(interpolateAtDepth,data['XC'].values,data['YC'].values,x,y,chunks=(1,1,ny,nx))
```


On Thursday, January 26, 2017 at 5:15:07 PM UTC-8, Ryan Abernathey wrote:
Jody,

Unfortunately this holy grail regridding package doesn't exist yet.

I have done similar things to what you are describing on out of core datasets. The trick I have used is to work directly on the dask layer.

You can access the underlying dask arrays in a chunked xarray dataset via the dataset.variable.data syntax.
Then you can apply some arbitrary function over the chunks using `map_blocks`.
(xarray `apply` will do something similar)

In your case, if you chunk your data by time and z, you can then apply the interpolation routine to each 2D slice.

Another cool trick I discovered was to use a custom "store" for dask to write my data out to disk in a streaming fashion. Maybe you can modify this routine to work for you:

Cheers,
Ryan
On Thu, Jan 26, 2017 at 12:52 PM, Jody Klymak <jkl...@gmail.com> wrote:
Hi all,

Not even sure if xarray is the right tool for this, but it sounds like many of you have thought about this problem...

I'm using the MITgcm and Ryan's very nice xmitgcm package to read "mds" files (basically flat bianry files for each variable).

I have run the model on a regular x/y grid (Cartesian co-ordinates) and now wish to write initialization files from this run onto a stretched horizontal grid with more resolution in the center and less towards the boundaries.  The output should be in the flat binary file format (rather than netcdf, for instance).  

Any advice on the doing the above is welcome.  

My strategy was to read each field in and then inrepolate in x and then in y (its not going to matter too much that there will be some aliasing in the coarse regions) and then write the files.  The input files are 800 Mb and the output 1.6 Gb, so its doable in memory. I was just going to run `np.interp` in y and then in x for each z level.  

Is there a more clever way to do the above?  Perhaps one that would scale up to memory footprints that are much larger?  I apprecaite there is a regridding package being considered, but for now am I missing anything obvious?

Thanks a lot,   Jody

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.

Matthew Rocklin

unread,
Jan 30, 2017, 8:33:54 AM1/30/17
to xarray
The block of size (1, 1, 1, 1) was Dask.array trying to guess what the output dtype of your function was by feeding it a little piece of data.  You can avoid this by specifying the dtype explicitly.  See http://dask.pydata.org/en/latest/array-api.html#dask.array.core.map_blocks

To unsubscribe from this group and stop receiving emails from it, send an email to xarray+unsubscribe@googlegroups.com.

To post to this group, send email to xar...@googlegroups.com.

Jody Klymak

unread,
Jan 30, 2017, 12:06:40 PM1/30/17
to xarray


On Monday, January 30, 2017 at 5:33:54 AM UTC-8, Matthew Rocklin wrote:
The block of size (1, 1, 1, 1) was Dask.array trying to guess what the output dtype of your function was by feeding it a little piece of data.  You can avoid this by specifying the dtype explicitly.  See http://dask.pydata.org/en/latest/array-api.html#dask.array.core.map_blocks

OK, thanks!  Jody
 

Jody Klymak

unread,
Jan 30, 2017, 4:49:33 PM1/30/17
to xarray


On Thursday, January 26, 2017 at 5:15:07 PM UTC-8, Ryan Abernathey wrote:

Another cool trick I discovered was to use a custom "store" for dask to write my data out to disk in a streaming fashion. Maybe you can modify this routine to work for you:

OK, I forked to this version, that writes each chunk to the same file, with the position determined using `f.seek`.  Assumes chunks are along the first dimension:  


Thanks for the pointers!

Cheers,   Jody

Reply all
Reply to author
Forward
0 new messages