Profiling xarray - all the time is spent in thread wait

88 views
Skip to first unread message

Jo Cha

unread,
Nov 18, 2016, 12:59:13 PM11/18/16
to xarray
Hi,

I'm sub-setting some large-ish weather files - 1 file per year per variable - 6 variables, 5 years, so 30 files in netCDF4 with chunk compression. It seems exceedingly slow and I'm trying to figure out why (I thought it might be the compression). However when I run profile (using PyCharm) it tells me 95% of time is spend waiting for thread lock. I'm not sure if this means there is a problem with the threading (e.g. race condition) or if its just that the tasks are running in threads and are hidden from the profiler. Would welcome any thooughts/insights/suggestions on how to proceed.

Cheers,
J.

Stephan Hoyer

unread,
Nov 18, 2016, 1:09:53 PM11/18/16
to xarray
HDF5 handles chunk (de)compression, and unfortunately is not thread safe, which means we cannot parallelize it with dask.array's default multi-threaded backend. (For safety, we have a thread lock around calls to HDF5, but that should not effect performance).

This would be a good use case for dask.distributed. It's not entirely working yet, but see https://github.com/pydata/xarray/issues/798 for discussion.

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+unsubscribe@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/94f47f0d-012d-49aa-80d7-86cc36ad4bb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew Rocklin

unread,
Nov 18, 2016, 1:35:08 PM11/18/16
to xarray
Profiling doesn't work well with multi-threading.  I recommend profiling in single threaded mode.  This will give you much more information.

dask.set_options(get=dask.async.get_sync)

Jo Cha

unread,
Nov 20, 2016, 7:16:38 AM11/20/16
to xarray
Thanks I will look at both those options. I think I can multiprocess my overall task which could offset the lack of parallelism in decompression.


On Friday, 18 November 2016 18:35:08 UTC, Matthew Rocklin wrote:
Profiling doesn't work well with multi-threading.  I recommend profiling in single threaded mode.  This will give you much more information.

dask.set_options(get=dask.async.get_sync)
On Fri, Nov 18, 2016 at 1:09 PM, Stephan Hoyer <sho...@gmail.com> wrote:
HDF5 handles chunk (de)compression, and unfortunately is not thread safe, which means we cannot parallelize it with dask.array's default multi-threaded backend. (For safety, we have a thread lock around calls to HDF5, but that should not effect performance).

This would be a good use case for dask.distributed. It's not entirely working yet, but see https://github.com/pydata/xarray/issues/798 for discussion.
On Fri, Nov 18, 2016 at 9:59 AM, Jo Cha <jon.cham...@gmail.com> wrote:
Hi,

I'm sub-setting some large-ish weather files - 1 file per year per variable - 6 variables, 5 years, so 30 files in netCDF4 with chunk compression. It seems exceedingly slow and I'm trying to figure out why (I thought it might be the compression). However when I run profile (using PyCharm) it tells me 95% of time is spend waiting for thread lock. I'm not sure if this means there is a problem with the threading (e.g. race condition) or if its just that the tasks are running in threads and are hidden from the profiler. Would welcome any thooughts/insights/suggestions on how to proceed.

Cheers,
J.

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.

To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/94f47f0d-012d-49aa-80d7-86cc36ad4bb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.

To post to this group, send email to xar...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages