Fabien
unread,May 17, 2016, 11:33:27 AM5/17/16Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to xar...@googlegroups.com
Hi,
I am using the multiprocessing module to run an IO task on several
cores. The task is opening a netcdf file (climate data on a lat/lon grid
worldwide), reads one grid point somewhere on the globe, does a bit of
processing, and stores it in a separate file (one separate file per
task, so there is concurrent access but no concurrent write). The task
is run several thousands of times distributed on 36 cores (AWS machine).
Out of these two options, which one is better:
1. the path to the file to read is given as argument to the task, so
that the file is opened/read/closed by each task
2. a DataSet object is given as argument to the task, so that the task
just has to read out of the DataSet.
I tested option 1 and it seems to work fine (the I/O is faster than the
data processing, and therefore the multiprocessing makes the whole
process much faster), but I didn't test option 2.
On a side note: do I have to care about closing netcdf files when using
xarray?
Thanks and cheers,
Fabien