Concurrent access to a netcdf file

Fabien

unread,

May 17, 2016, 11:33:27 AM5/17/16

to xar...@googlegroups.com

Hi,

I am using the multiprocessing module to run an IO task on several
cores. The task is opening a netcdf file (climate data on a lat/lon grid
worldwide), reads one grid point somewhere on the globe, does a bit of
processing, and stores it in a separate file (one separate file per
task, so there is concurrent access but no concurrent write). The task
is run several thousands of times distributed on 36 cores (AWS machine).

Out of these two options, which one is better:
1. the path to the file to read is given as argument to the task, so
that the file is opened/read/closed by each task
2. a DataSet object is given as argument to the task, so that the task
just has to read out of the DataSet.

I tested option 1 and it seems to work fine (the I/O is faster than the
data processing, and therefore the multiprocessing makes the whole
process much faster), but I didn't test option 2.

On a side note: do I have to care about closing netcdf files when using
xarray?

Thanks and cheers,

Fabien

Fabien

unread,

May 18, 2016, 3:06:51 PM5/18/16

to xar...@googlegroups.com

On 05/17/2016 05:33 PM, Fabien wrote:
> On a side note: do I have to care about closing netcdf files when using
> xarray?

Nevermind: rtfd!

http://xarray.pydata.org/en/stable/generated/xarray.Dataset.close.html#xarray.Dataset.close

Stephan Hoyer

unread,

May 18, 2016, 4:18:54 PM5/18/16

to xar...@googlegroups.com

Using paths instead of datasets is fine, though you may notice a small amount of overhead when opening files. The reason not to pass dataset objects is you may run out of file objects. This is the same reason why you want to close datasets.

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To post to this group, send email to xar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/nhieg2%24nkj%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Fabien

unread,

May 18, 2016, 5:19:13 PM5/18/16

to xar...@googlegroups.com

On 05/18/2016 10:18 PM, Stephan Hoyer wrote:
> Using paths instead of datasets is fine, though you may notice a small
> amount of overhead when opening files. The reason not to pass dataset
> objects is you may run out of file objects. This is the same reason why
> you want to close datasets.

Thanks Stephan!

Cheers,

Fabien

Reply all

Reply to author

Forward