Combining 2 datasets with different dimensions

762 views
Skip to first unread message

Jiří Nádvorník

unread,
May 20, 2020, 10:46:41 AM5/20/20
to xarray
Hi all,

first of all thanks for this tool - it seems to have very big potential!

I'm trying to construct a data cube by combining images and spectra. In numpy I would combine these arrays:
(1489, 2048, 1, 1, 1, 1, 1)
(1, 1, 1, 4620, 1, 1, 1)

For explanation - the image dataset:

2020-05-20_16-40-33.png


The spectral dataset:


2020-05-20_16-43-01.png


I'm expecting the image dataset to grow by 4620 points in wavelength and by 1 in every other dimension. But whichever combination method I use, it complains: 

arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1489, 1}


What am I missing? The axis x seems labeled to me actually in the first place.. and no matter what method I use it always returns the same error.


Thanks for help!


Cheers,


Jiri




Stephan Hoyer

unread,
May 20, 2020, 2:46:25 PM5/20/20
to xarray
In xarray, you would typically work with arrays of size (1489, 2048) and (4620,). You don't need the dummy dimensions for alignment in arithmetic, because xarray's arithmetic aligns by dimension name instead of position.

If you really want fully expanded arrays, you could do this explicitly with the expand_dims() method. But in most cases you shouldn't need it.

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/9471454a-dfad-4349-ab7a-fd90cbae3e30%40googlegroups.com.

Jiří Nádvorník

unread,
May 20, 2020, 3:27:35 PM5/20/20
to xar...@googlegroups.com
Sadly I can' think of a way to get rid of the resolution dummies, as I want to also construct different resolutions in the same cube and control the disk clustering based on these (outside of xarray).

The only ones I'm not sure about are the ra, dec (which is the astronomical alternative to longitude and latitude). I'm keeping those according to the example from docs with temperature and precipitation, where you are mentioning that you need to keep these coordinates as separate dimensions, because they are not rectangular and are not easily projected to 2d grid.

Anyway if I need the expand_dims(), how do I do that? Expand the first dataset dimension by the size of the second and then concat?

Thanks for help!

Cheers,

Jiri


st 20. 5. 2020 v 20:46 odesílatel Stephan Hoyer <sho...@gmail.com> napsal:

Stephan Hoyer

unread,
May 20, 2020, 3:31:53 PM5/20/20
to xarray
On Wed, May 20, 2020 at 12:27 PM Jiří Nádvorník <nadvor...@gmail.com> wrote:
Sadly I can' think of a way to get rid of the resolution dummies, as I want to also construct different resolutions in the same cube and control the disk clustering based on these (outside of xarray).

The only ones I'm not sure about are the ra, dec (which is the astronomical alternative to longitude and latitude). I'm keeping those according to the example from docs with temperature and precipitation, where you are mentioning that you need to keep these coordinates as separate dimensions, because they are not rectangular and are not easily projected to 2d grid.

Anyway if I need the expand_dims(), how do I do that? Expand the first dataset dimension by the size of the second and then concat?

Yes, that's exactly right. Something like this should work:
ds2.expand_dims(x=ds2.sizes['x'], y=ds2.sizes['y'])
 

Jiří Nádvorník

unread,
May 21, 2020, 3:48:40 AM5/21/20
to xar...@googlegroups.com
Hi,

tried the approach above and to a call:
img_ds.expand_dims(x=spec_ds.sizes['x'], y=spec_ds.sizes['y'])

it returns:
ValueError: Dimension x already exists.

After reading the documentation of expand_dims, it seems that it's actually meant for adding *new* dimensions, not actually expanding the existing ones... or am I wrong?

BR,

Jiri

st 20. 5. 2020 v 21:31 odesílatel Stephan Hoyer <sho...@gmail.com> napsal:

Stephan Hoyer

unread,
May 21, 2020, 7:23:57 PM5/21/20
to xarray
Hmm. You seem to be right about expand_dims, I remembered that incorrectly.

There may be more elegant solutions, but one simple approach to duplicating along a dimension would be to use indexing with a 1D array of zeros, e.g.,
img_ds.isel(x=np.zeros((spec_ds.sizes['x'],), dtype=int))

You could also use xarray.concat:
xarray.concat(spec_ds.sizes['x'] * [img_ds], dim='x')

Jiří Nádvorník

unread,
May 25, 2020, 5:04:26 AM5/25/20
to xar...@googlegroups.com
Ok, the concat is the "right" solution, but this is what I want to do and maybe I misunderstood what Xarray is capable of?
xarray.concat(spec_ds.sizes['wavelength'] * [img_ds], dim='wavelength')
MemoryError: Unable to allocate 52.5 GiB for an array with shape (1489, 2048, 1, 4620, 1, 1, 1) and data type float32

I don't want to create a dense array where I have 4620 spectral points for every image pixel (1489x2048), in fact, I have these spectral points only for one pixel.

So if I wish to have coordinate space  (1489, 2048, 1, 4620, 1, 1, 1) but not a materialized numpy array  (1489, 2048, 1, 4620, 1, 1, 1) beneath, is that possible with Xarray?

BR,

Jiri

pá 22. 5. 2020 v 1:23 odesílatel Stephan Hoyer <sho...@gmail.com> napsal:
Reply all
Reply to author
Forward
0 new messages