Converting xarray dataarray to numpy ndarray too slow

743 views
Skip to first unread message

SANJIV

unread,
Apr 3, 2020, 12:49:26 AM4/3/20
to xarray
Hi all,
     I have a newbie question about xarray. 

I am finding that converting an xarray dataarray to a numpy ndarray is painfully slow.

I have a temperature field stored as an xarray dataarray, whose dimensions are (1456,50,534), i.e., time x depth x linear index for location. The following commands in a Jupyter notebook take forever and I have to interrupt the kernel:

T.isel(ocean_time=10,s_rho=45,loc=300).load() 
T.isel(ocean_time=10,s_rho=45,loc=300).values

The exact values of the indices are not important as the command is always slow and never completes execution.

Has anybody encountered a similar issue?

Thanks,
Sanjiv

Chuan-Yuan Hsu

unread,
Apr 3, 2020, 11:58:51 AM4/3/20
to xar...@googlegroups.com
Sanjiv, 

Where do you run your code (ada/lonestar5)?  And how do you load the dataset?
Could you show more information to us? The 1456 x 50 x 534 is not too big to me. 

Ps: I don’t mind if you want to send me an email personally to discuss this.

Thanks,
Chuan-Yuan Hsu

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/7430f192-af6e-47f3-b15c-eed29e2878ab%40googlegroups.com.

Ryan Abernathey

unread,
Apr 3, 2020, 12:06:04 PM4/3/20
to xar...@googlegroups.com
Sanjiv,

The answer to your question depends completely on how the dataarray was created.

Can you please share the full repr of the dataset (output of print(ds)), as well as the commands you used to load the data?

Best,
Ryan

--

SANJIV RAMACHANDRAN

unread,
Apr 3, 2020, 2:49:04 PM4/3/20
to xar...@googlegroups.com
Hi all,
         Thanks for the quick replies. I am attaching an html file that shows the essential steps behind the creation of the dataset. The commands that failed to execute (too slow) are in the second-last cell. Also, just wanted to clarify that even though I am illustrating those two commands as problematic, my broader goal here is to figure out how to seamlessly work with xarray objects and functions (numpy or otherwise) that might otherwise only accept numpy arrays as input. This is the reason I was trying to use load() or 'values' with the xarray objects.

Best,
Sanjiv

You received this message because you are subscribed to a topic in the Google Groups "xarray" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xarray/DUuXZu0MFf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to xarray+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/CAJAZx5CaSN6AUCnkawHDJb186o4aRqFuWHWktKYNQRuC%3DXSUsw%40mail.gmail.com.
xarray_debug.html

Ryan Abernathey

unread,
Apr 3, 2020, 2:56:51 PM4/3/20
to xar...@googlegroups.com
Sanjiv,

Based on your attached file, your performance is probably limited by filesystem I/O and related to you choice of chunks:
DS_T   = xr.open_dataset(Tfile,chunks={'ocean_time':-1,'s_rho':-1,'eta_rho':25,'xi_rho':25})
If these chunks don't align well with the way the netCDF file actually stores the data on disk, you will get extremely poor performance, with you parallel reads leading to disk thrashing. This was discussed a bit here: https://github.com/pydata/xarray/issues/1440

Do you know if you netCDF file uses chunking internally? Have you tried a different chunk scheme? What about
DS_T   = xr.open_dataset(Tfile,chunks={'ocean_time': 1})
This will likely work a bit better, since it parallelizes over the slowly varying dimensions.

-Ryan


SANJIV RAMACHANDRAN

unread,
Apr 3, 2020, 4:59:49 PM4/3/20
to xar...@googlegroups.com
Hi Ryan,
             Thanks for the suggestion. I will confirm if my netcdf file is chunked internally (I don't believe it is). I tried changing the chunk scheme to a few different configurations (including the one you suggest). Either I have the same problem as before or in some cases execution fails even earlier in the code. I found this interesting link that might be related to what I am trying, will dig deeper into the documentation:


If I can just work with xarray objects, then I don't have to invoke load() and my original question becomes moot.

Sanjiv

Reply all
Reply to author
Forward
0 new messages