unpickleable functions

643 views
Skip to first unread message

Scott Cooper

unread,
Apr 11, 2012, 1:06:27 PM4/11/12
to mpi...@googlegroups.com
I have Python code which I need to parallelize to run on a cluster, using MPI. I am trying to decide if mpi4py is the solution. It seems to be, but there is a catch.

The existing code uses the multiprocessing module, and passes a function, along with data, to a slave process. Normally, passing functions to slave processes isn't a problem, because, with the multiprocessing module, each individual process imports whatever functions it needs from whatever module they reside in. But dynamically created functions aren't defined at the top level of any module. The obvious thing to do, since Python functions are first-class objects, is to pass the function to the process in the same way I pass data. But functions aren't pickleable, so they can't be passed that way. Although I encountered this problem with the multiprocessing module, I anticipate the same problem with mpi4py because of the following line in the documentation:

"MPI for Python can communicate any built-in or used-defined Python object taking advantage of the features provided by the mod:pickle module."

In the case of the multiprocess module, there's a workaround: a little kluge to insert the function into a module's namespace dynamically at runtime, so that the slave process can import it. But I really have no idea whether this would work with mpi4py.

I see that mpy4py has other ways, besides pickling, of communicating objects. But the single-segment buffer interface wouldn't apply to functions, and the user-defined MPI datatypes sound like they might exceed my skill level.

(The above is all to the best of my understanding, which may be imperfect: I'm new to all this.)

So, the questions are:
1) Does the "unpickleable function" problem occur with mpi4py?
2) If so, does the above-described workaround work with mpi4py?
3) Or is there another, or better way to solve the problem in mpi4py?

Lisandro Dalcin

unread,
Apr 11, 2012, 1:45:41 PM4/11/12
to mpi...@googlegroups.com
On 11 April 2012 20:06, Scott Cooper <se...@adelphia.net> wrote:
>
> "MPI for Python can communicate any built-in or used-defined Python object taking advantage of the features provided by the mod:pickle module."
>
> In the case of the multiprocess module, there's a workaround:  a little kluge to insert the function into a module's namespace dynamically at runtime, so that the slave process can import it.  But I really have no idea whether this would work with mpi4py.
>

If you are using multiprocessing on Linux (actually, any OS with a
decent fork() implementation), then the multiprocessing trick do work
as long as you insert the function in the namespace BEFORE spawning
new processes (wich behind the scenes is done using fork() in Linux).
However, I doubt this approach will work in Windows. And a similar
problem do occur with mpi4py, for this to work, you have to
pre-register the functions at ALL processes in the run.

> I see that mpy4py has other ways, besides pickling, of communicating objects. But the single-segment buffer interface wouldn't apply to functions, and the user-defined MPI datatypes sound like they might exceed my skill level.
>

Moreover, MPI datatypes would simply not work for communicating
functions (nor any Python object that does not have a buffer to
export)

> (The above is all to the best of my understanding, which may be imperfect:  I'm new to all this.)
>

Well, I think you actually got all of it :-).

> So, the questions are:
> 1) Does the "unpickleable function" problem occur with mpi4py?

Yes, of course.

> 2) If so, does the above-described workaround work with mpi4py?

Yes, as long as the functions are registered at all processes. But I
understand this is a big problem if these functions are generated at
runtime. We would need to figure out a way to communicate them,
perhaps some hack around communicating the code objects and
reconstructing the function on the receiving side. Never ever tried to
implement that, though; not sure if it is even possible.

> 3) Or is there another, or better way to solve the problem in mpi4py?
>

Not that I know. But using the 'marshal' module to serialize the
function's code object, perhaps you could implement a way to
serialize/deserialize functions (at least the one you define/generate
in Python code)


--
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169

Daily, Jeff A

unread,
Apr 11, 2012, 3:51:09 PM4/11/12
to mpi...@googlegroups.com, Scott Cooper
> -----Original Message-----
> From: mpi...@googlegroups.com [mailto:mpi...@googlegroups.com] On
> Behalf Of Lisandro Dalcin
> Sent: Wednesday, April 11, 2012 10:46 AM
> To: mpi...@googlegroups.com
> Subject: Re: [mpi4py] unpickleable functions
>
> On 11 April 2012 20:06, Scott Cooper <se...@adelphia.net> wrote:
> >
> > "MPI for Python can communicate any built-in or used-defined Python
> object taking advantage of the features provided by the mod:pickle
> module."
> >
*snip*

> > I see that mpy4py has other ways, besides pickling, of communicating
> objects. But the single-segment buffer interface wouldn't apply to functions,
> and the user-defined MPI datatypes sound like they might exceed my skill
> level.
> >
>
> Moreover, MPI datatypes would simply not work for communicating
> functions (nor any Python object that does not have a buffer to
> export)
>
> > (The above is all to the best of my understanding, which may be
> > imperfect:  I'm new to all this.)
> >
>
> Well, I think you actually got all of it :-).
>
> > So, the questions are:
> > 1) Does the "unpickleable function" problem occur with mpi4py?
>
> Yes, of course.
>
> > 2) If so, does the above-described workaround work with mpi4py?
>
> Yes, as long as the functions are registered at all processes. But I understand
> this is a big problem if these functions are generated at runtime. We would
> need to figure out a way to communicate them, perhaps some hack around
> communicating the code objects and reconstructing the function on the
> receiving side. Never ever tried to implement that, though; not sure if it is
> even possible.
>
> > 3) Or is there another, or better way to solve the problem in mpi4py?
> >
>
> Not that I know. But using the 'marshal' module to serialize the function's
> code object, perhaps you could implement a way to serialize/deserialize
> functions (at least the one you define/generate in Python code)

I have attached some code for pickling functions using a custom pickler and the marshal module. I haven't used it in a few years (2009?) as it was developed as part of my MS thesis. It worked at the time using I believe Python 2.6.

I used the "picklefunction.py" as part of a master/slave parallelism using MPI, broadcasting certain functions and their arguments to N spawned slaves and gathering the results back. The only catch was that the functions needed to be "pure" functions such that any function arguments modified within the function needed to be returned by the function.

The attached zip has a master.py and a slave.py and the functionpickle.py sources. I don't remember which version of mpi4py I was using for this, sorry. The master.py will use MPI to spawn slaves, then there's a function decorator for decorating functions you wish to broadcast when called instead of calling locally within the master. You don't have to use it as a decorator -- I can easily see calling the custom dumps() directly with your own function. YMMV.

As another thought I haven't explored, does the ipython project do anything with pickling functions?

Please let me know if this was of any use to you.

__________________________________________________
Jeff Daily
Scientist
Computational Sciences and Mathematics Division
Data Intensive Scientific Computing Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, MSIN K7-90
Richland, WA  99352 USA
Tel:  509-372-6548
Fax: 509-372-4720
jeff....@pnnl.gov
www.pnnl.gov


functionpickle.zip

Thomas Wiecki

unread,
Apr 11, 2012, 4:01:39 PM4/11/12
to mpi...@googlegroups.com
I had a similar problem. Not sure if this will be applicable to your
specific problem, but mpi4py_map gets around this by creating the same
objects in each process (so nothing has to be pickled).

You can use it as follows:

from mpi4py_map import map
seq = [non_pickable_object1, non_pickable_object2]
print map(lambda x: x.calc(), seq)

Then call your program with mpirun, e.g.:
mpi4run -n 4 mpi_square.py

And you can basically pass every non-pickable object in the list to be
distributed. Note that this only works if all the arguments to map()
are the same in each process.

You can find mpi4py_map here (written by me):
https://github.com/twiecki/mpi4py_map

Thomas

> --
> You received this message because you are subscribed to the Google Groups "mpi4py" group.
> To post to this group, send email to mpi...@googlegroups.com.
> To unsubscribe from this group, send email to mpi4py+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.
>

Michael McKerns

unread,
Apr 20, 2012, 12:04:28 PM4/20/12
to mpi4py
I also have an few map functions that are part of pyina,
providing totally encapsulated mpi4py map functions that
run from otherwise standard python code.
http://dev.danse.us/trac/pathos/wiki/pyina

====================================================
For example:
"""
use map to directly run with mpi on two nodes:
python times2.py
"""
from pyina.ez_map import ez_map2 as map

def times2(id):
return id * 2

nodes = 2; N = 4
x = range(N * nodes)
print("Input:\n %s" % x)
print("Running on %d cores..." % nodes)

y = map(times2, x, nnodes=nodes)
print("Output:\n %s" % y)
====================================================


I haven't kept up well with the recent changes from mpi4py,
so the maps could use some scrubbing and updating. I'll have
to take a closer look at what you've done as well, Thomas.

The relevant thing about pyina for this thread, however, is
that it uses the dill pickler, which serializes pretty much
all of the basic python types.
http://dev.danse.us/trac/pathos/wiki/dill

While it's not the best practice to rely on serialization,
it is a nice crutch when you are feeling lazy.

- Mike McKerns


On Apr 11, 4:01 pm, Thomas Wiecki <thomas.wie...@googlemail.com>
wrote:
> I had a similar problem. Not sure if this will be applicable to your
> specific problem, but mpi4py_map gets around this by creating the same
> objects in each process (so nothing has to be pickled).
>
> You can use it as follows:
>
> from mpi4py_map import map
> seq = [non_pickable_object1, non_pickable_object2]
> print map(lambda x: x.calc(), seq)
>
> Then call your program with mpirun, e.g.:
> mpi4run -n 4 mpi_square.py
>
> And you can basically pass every non-pickable object in the list to be
> distributed. Note that this only works if all the arguments to map()
> are the same in each process.
>
> You can find mpi4py_map here (written by me):https://github.com/twiecki/mpi4py_map
>
> Thomas
>
>
>
>
>
>
>
> On Wed, Apr 11, 2012 at 1:45 PM, Lisandro Dalcin <dalc...@gmail.com> wrote:

Lisandro Dalcin

unread,
Apr 20, 2012, 1:29:11 PM4/20/12
to mpi...@googlegroups.com
On 20 April 2012 19:04, Michael McKerns <mmck...@enthought.com> wrote:
>
> The relevant thing about pyina for this thread, however, is
> that it uses the dill pickler, which serializes pretty much
> all of the basic python types.
> http://dev.danse.us/trac/pathos/wiki/dill
>

Thanks for the pointer!. mpi4py lets users override the dump/load
routines, so we should be able to do:

from mpi4py import MPI
MPI._p_pickle.dumps = dill.dumps
MPI._p_pickle.loads = dill.loads

Manuel

unread,
Jun 7, 2012, 3:45:32 PM6/7/12
to mpi4py
I just wanted to say thank you to everyone that has contributed to
this thread! I happened upon it having come to the very same issues
(poignantly) described by Mr. Cooper.

Lisandro's solution worked perfectly for my needs!

Thank you, Everyone!

On Apr 20, 12:29 pm, Lisandro Dalcin <dalc...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages