The existing code uses the multiprocessing module, and passes a function, along with data, to a slave process. Normally, passing functions to slave processes isn't a problem, because, with the multiprocessing module, each individual process imports whatever functions it needs from whatever module they reside in. But dynamically created functions aren't defined at the top level of any module. The obvious thing to do, since Python functions are first-class objects, is to pass the function to the process in the same way I pass data. But functions aren't pickleable, so they can't be passed that way. Although I encountered this problem with the multiprocessing module, I anticipate the same problem with mpi4py because of the following line in the documentation:
"MPI for Python can communicate any built-in or used-defined Python object taking advantage of the features provided by the mod:pickle module."
In the case of the multiprocess module, there's a workaround: a little kluge to insert the function into a module's namespace dynamically at runtime, so that the slave process can import it. But I really have no idea whether this would work with mpi4py.
I see that mpy4py has other ways, besides pickling, of communicating objects. But the single-segment buffer interface wouldn't apply to functions, and the user-defined MPI datatypes sound like they might exceed my skill level.
(The above is all to the best of my understanding, which may be imperfect: I'm new to all this.)
So, the questions are:
1) Does the "unpickleable function" problem occur with mpi4py?
2) If so, does the above-described workaround work with mpi4py?
3) Or is there another, or better way to solve the problem in mpi4py?
If you are using multiprocessing on Linux (actually, any OS with a
decent fork() implementation), then the multiprocessing trick do work
as long as you insert the function in the namespace BEFORE spawning
new processes (wich behind the scenes is done using fork() in Linux).
However, I doubt this approach will work in Windows. And a similar
problem do occur with mpi4py, for this to work, you have to
pre-register the functions at ALL processes in the run.
> I see that mpy4py has other ways, besides pickling, of communicating objects. But the single-segment buffer interface wouldn't apply to functions, and the user-defined MPI datatypes sound like they might exceed my skill level.
>
Moreover, MPI datatypes would simply not work for communicating
functions (nor any Python object that does not have a buffer to
export)
> (The above is all to the best of my understanding, which may be imperfect: I'm new to all this.)
>
Well, I think you actually got all of it :-).
> So, the questions are:
> 1) Does the "unpickleable function" problem occur with mpi4py?
Yes, of course.
> 2) If so, does the above-described workaround work with mpi4py?
Yes, as long as the functions are registered at all processes. But I
understand this is a big problem if these functions are generated at
runtime. We would need to figure out a way to communicate them,
perhaps some hack around communicating the code objects and
reconstructing the function on the receiving side. Never ever tried to
implement that, though; not sure if it is even possible.
> 3) Or is there another, or better way to solve the problem in mpi4py?
>
Not that I know. But using the 'marshal' module to serialize the
function's code object, perhaps you could implement a way to
serialize/deserialize functions (at least the one you define/generate
in Python code)
--
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169
I have attached some code for pickling functions using a custom pickler and the marshal module. I haven't used it in a few years (2009?) as it was developed as part of my MS thesis. It worked at the time using I believe Python 2.6.
I used the "picklefunction.py" as part of a master/slave parallelism using MPI, broadcasting certain functions and their arguments to N spawned slaves and gathering the results back. The only catch was that the functions needed to be "pure" functions such that any function arguments modified within the function needed to be returned by the function.
The attached zip has a master.py and a slave.py and the functionpickle.py sources. I don't remember which version of mpi4py I was using for this, sorry. The master.py will use MPI to spawn slaves, then there's a function decorator for decorating functions you wish to broadcast when called instead of calling locally within the master. You don't have to use it as a decorator -- I can easily see calling the custom dumps() directly with your own function. YMMV.
As another thought I haven't explored, does the ipython project do anything with pickling functions?
Please let me know if this was of any use to you.
__________________________________________________
Jeff Daily
Scientist
Computational Sciences and Mathematics Division
Data Intensive Scientific Computing Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, MSIN K7-90
Richland, WA 99352 USA
Tel: 509-372-6548
Fax: 509-372-4720
jeff....@pnnl.gov
www.pnnl.gov
You can use it as follows:
from mpi4py_map import map
seq = [non_pickable_object1, non_pickable_object2]
print map(lambda x: x.calc(), seq)
Then call your program with mpirun, e.g.:
mpi4run -n 4 mpi_square.py
And you can basically pass every non-pickable object in the list to be
distributed. Note that this only works if all the arguments to map()
are the same in each process.
You can find mpi4py_map here (written by me):
https://github.com/twiecki/mpi4py_map
Thomas
> --
> You received this message because you are subscribed to the Google Groups "mpi4py" group.
> To post to this group, send email to mpi...@googlegroups.com.
> To unsubscribe from this group, send email to mpi4py+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.
>
Thanks for the pointer!. mpi4py lets users override the dump/load
routines, so we should be able to do:
from mpi4py import MPI
MPI._p_pickle.dumps = dill.dumps
MPI._p_pickle.loads = dill.loads