Non-blocking or asynchronous I/O

128 views

Skip to first unread message

Jesper Larsen

unread,

Jan 2, 2014, 7:55:43 AM1/2/14

to netcdf4...@googlegroups.com

Hi NetCDF4-Python people

We are happily using NetCDF4-Python in our production system which provides meterological and oceanographic forecasts. Some of our operational processing is limited by both CPU and I/O (reading and writing NetCDF files). So I would like to overlap computations and I/O to speed up stuff. I am using the multiprocessing module for splitting the workload on multiple cores (we have 12 cores per node). As it is now I let the worker processes read in data at will but with a lock/semaphore so that only one process is reading data at a time (with the multiprocessing module it is simply to expensive to have the master thread reading/writing data and passing it to processes since it has to be pickled for transmission). After that the data are processed and the next chunk is read in. But I would really like to do this in a more efficient (as in less wall time) manner (which also means that I would like to read larger chunks of data when the CPU is busy).

What I am thinking is that I could let each process create an "I/O thread" using the threading module. And then just let that thread read data as fast as possible and let the worker thread in that process poll it for data when it needs more data. Do you have any experience doing something similar? Or maybe I should just let the master process read in data (non-blocking) and distribute it using a fast transfer method like mpi4py for the distribution? Any advice?

Best regards,

Jesper Larsen

unread,

Jan 6, 2014, 10:47:02 AM1/6/14

to Jeff Whitaker, netcdf4...@googlegroups.com

Hi Jeff

Thanks for your reply (and sorry for the late response). I will look into the links you have posted.

And thanks for your great NetCDF module for Python:-)

Best regards,

Jesper

2014/1/2 Jeff Whitaker <jeffrey.s...@noaa.gov>

Jesper Larsen

January 2, 2014 5:55 AM

Jesper: I don't have any experience in this arena. There is an mpi interface to the C lib which is not yet wrapped in python.

http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/parallel-access.html

and

http://www.unidata.ucar.edu/software/netcdf/workshops/2009/pnetcdf/index.html

It could be wrapped in python, but I haven't had any need for it yet. There is also some information on asynchronous netcdf IO here

http://www.c2sm.ethz.ch/research/High_Performance_Computing/COSMO-CCLM/hp2c_two_year_meeting/3_osuna

This sounds similar to what you propose, you might find some useful pointers there.

Regards, Jeff