Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using Pool map with a method of a class and a list

6,367 views
Skip to first unread message

Luca Cerone

unread,
Aug 6, 2013, 1:12:26 PM8/6/13
to
Hi guys,
I would like to apply the Pool.map method to a member of a class.

Here is a small example that shows what I would like to do:

from multiprocessing import Pool

class A(object):
def __init__(self,x):
self.value = x
def fun(self,x):
return self.value**x


l = range(10)

p = Pool(4)

op = p.map(A.fun,l)

#using this with the normal map doesn't cause any problem

This fails because it says that the methods can't be pickled.
(I assume it has something to do with the note in the documentation: "functionality within this package requires that the __main__ module be importable by the children.", which is obscure to me).

I would like to understand two things: why my code fails and when I can expect it to fail? what is a possible workaround?

Thanks a lot in advance to everybody for the help!

Cheers,
Luca

Chris Angelico

unread,
Aug 6, 2013, 1:38:54 PM8/6/13
to pytho...@python.org
On Tue, Aug 6, 2013 at 6:12 PM, Luca Cerone <luca....@gmail.com> wrote:
> from multiprocessing import Pool
>
> class A(object):
> def __init__(self,x):
> self.value = x
> def fun(self,x):
> return self.value**x
>
>
> l = range(10)
>
> p = Pool(4)
>
> op = p.map(A.fun,l)

Do you ever instantiate any A() objects? You're attempting to call an
unbound method without passing it a 'self'.

You may find the results completely different in Python 2 vs Python 3,
and between bound and unbound methods. In Python 3, an unbound method
is simply a function. In both versions, a bound method carries its
first argument around, so it has to be something different. Play
around with it a bit.

ChrisA
Message has been deleted

Luca Cerone

unread,
Aug 6, 2013, 3:42:31 PM8/6/13
to
Hi Chris, thanks

> Do you ever instantiate any A() objects? You're attempting to call an
>
> unbound method without passing it a 'self'.

I have tried a lot of variations, instantiating the object, creating lambda functions that use the unbound version of fun (A.fun.__func__) etc etc..
I have played around it quite a bit before posting.

As far as I have understood the problem is due to the fact that Pool pickle the function and copy it in the various pools..
But since the methods cannot be pickled this fails..

The same example I posted won't run in Python 3.2 neither (I am mostly interested in a solution for Python 2.7, sorry I forgot to mention that).

Thanks in any case for the help, hopefully there will be some other advice in the ML :)

Cheers,
Luca

Joshua Landau

unread,
Aug 7, 2013, 2:48:32 AM8/7/13
to Luca Cerone, python-list
I think you might not understand what Chris said.

Currently this does *not* work with Python 2.7 as you suggested it would.

>>> op = map(A.fun,l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unbound method fun() must be called with A instance as
first argument (got int instance instead)

This, however, does:

>>> op = map(A(3).fun,l)
>>> op
[1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683]


Chris might have also been confused because once you fix that it works
in Python 3.

You will find that
http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma
explains the problem in more detail than I understand. I suggest
reading it and relaying further questions back to us. Or use Python 3
;).

Luca Cerone

unread,
Aug 7, 2013, 4:33:05 AM8/7/13
to
Hi Joshua thanks!

> I think you might not understand what Chris said.
> Currently this does *not* work with Python 2.7 as you suggested it would.
> >>> op = map(A.fun,l)

Yeah actually that wouldn't work even in Python 3, since value attribute used by fun has not been set.
It was my mistake in the example, but it is not the source of the problem..

> This, however, does:
> >>> op = map(A(3).fun,l)
>
> >>> op
>
> [1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683]
>
>

This works fine (and I knew that).. but is not what I want...

You are using the map() function that comes with Python. I want
to use the map() method of the Pool class (available in the multiprocessing module).

And there are differences between map() and Pool.map() apparently, so that if something works fine with map() it may not work with Pool.map() (as in my case).

To correct my example:

from multiprocessing import Pool

class A(object):
def __init__(self,x):
self.value = x
def fun(self,x):
return self.value**x

l = range(100)
p = Pool(4)
op = p.map(A(3).fun, l)

doesn't work neither in Python 2.7, nor 3.2 (by the way I can't use Python 3 for my application).

> You will find that
> http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-> > when-using-pythons-multiprocessing-pool-ma
> explains the problem in more detail than I understand. I suggest
> reading it and relaying further questions back to us. Or use Python 3

:) Thanks, but of course I googled and found this link before posting. I don't understand much of the details as well, that's why I posted here.

Anyway, thanks for the attempt :)

Luca

Joshua Landau

unread,
Aug 7, 2013, 5:47:06 AM8/7/13
to Luca Cerone, python-list
On 7 August 2013 09:33, Luca Cerone <luca....@gmail.com> wrote:
> To correct my example:
>
> from multiprocessing import Pool
>
> class A(object):
> def __init__(self,x):
> self.value = x
> def fun(self,x):
> return self.value**x
>
> l = range(100)
> p = Pool(4)
> op = p.map(A(3).fun, l)
>
> doesn't work neither in Python 2.7, nor 3.2 (by the way I can't use Python 3 for my application).

Are you using Windows? Over here on 3.3 on Linux it does. Not on 2.7 though.

>> You will find that
>> http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-> > when-using-pythons-multiprocessing-pool-ma
>> explains the problem in more detail than I understand. I suggest
>> reading it and relaying further questions back to us. Or use Python 3
>
> :) Thanks, but of course I googled and found this link before posting. I don't understand much of the details as well, that's why I posted here.
>
> Anyway, thanks for the attempt :)

Reading there, the simplest method seems to be, in effect:

from multiprocessing import Pool
from functools import partial

class A(object):
def __init__(self,x):
self.value = x
def fun(self,x):
return self.value**x

def _getattr_proxy_partialable(instance, name, arg):
return getattr(instance, name)(arg)

def getattr_proxy(instance, name):
"""
A version of getattr that returns a proxy function that can
be pickled. Only function calls will work on the proxy.
"""
return partial(_getattr_proxy_partialable, instance, name)

l = range(100)
p = Pool(4)
op = p.map(getattr_proxy(A(3), "fun"), l)
print(op)

Luca Cerone

unread,
Aug 7, 2013, 6:10:51 AM8/7/13
to
> > doesn't work neither in Python 2.7, nor 3.2 (by the way I can't use Python 3 for my application).
>
> Are you using Windows? Over here on 3.3 on Linux it does. Not on 2.7 though.

No I am using Ubuntu (12.04, 64 bit).. maybe things changed from 3.2 to 3.3?

> from multiprocessing import Pool
>
> from functools import partial
>
>
>
> class A(object):
>
> def __init__(self,x):
>
> self.value = x
>
> def fun(self,x):
>
> return self.value**x
>
>
>
> def _getattr_proxy_partialable(instance, name, arg):
>
> return getattr(instance, name)(arg)
>
>
>
> def getattr_proxy(instance, name):
>
> """
>
> A version of getattr that returns a proxy function that can
>
> be pickled. Only function calls will work on the proxy.
>
> """
>
> return partial(_getattr_proxy_partialable, instance, name)
>
>
>
> l = range(100)
>
> p = Pool(4)
>
> op = p.map(getattr_proxy(A(3), "fun"), l)
>
> print(op)

I can't try it now, I'll let you know later if it works!
(Though just by reading I can't really understand what the code does).

Thanks for the help,
Luca

Joshua Landau

unread,
Aug 7, 2013, 7:53:40 AM8/7/13
to Luca Cerone, python-list
On 7 August 2013 11:10, Luca Cerone <luca....@gmail.com> wrote:
> I can't try it now, I'll let you know later if it works!
> (Though just by reading I can't really understand what the code does).

Well,

>> from multiprocessing import Pool
>> from functools import partial
>>
>> class A(object):
>> def __init__(self,x):
>> self.value = x
>> def fun(self,x):
>> return self.value**x

This is all the same, as with

>> l = range(100)
>> p = Pool(4)

You then wanted to do:

> op = p.map(A(3).fun, l)

but bound methods can't be pickled, it seems.

However, A(3) *can* be pickled. So what we want is a function:

def proxy(arg):
A(3).fun(arg)

so we can write:

> op = p.map(proxy, l)

To generalise you might be tempted to write:

def generic_proxy(instance, name):
def proxy(arg):
# Equiv. of instance.name(arg)
getattr(instance, name)(arg)

but the inner function won't work as functions-in-functions can't be
pickled either.

So we use:

>> def _getattr_proxy_partialable(instance, name, arg):
>> return getattr(instance, name)(arg)

Which takes all instance, name and arg. Of course we only want our
function to take arg, so we partial it:

>> def getattr_proxy(instance, name):
>> """
>> A version of getattr that returns a proxy function that can
>> be pickled. Only function calls will work on the proxy.
>> """
>> return partial(_getattr_proxy_partialable, instance, name)

partial objects are picklable, btw.

>> op = p.map(getattr_proxy(A(3), "fun"), l)
>> print(op)

:)

Peter Otten

unread,
Aug 7, 2013, 10:46:09 AM8/7/13
to pytho...@python.org
Joshua Landau wrote:

> On 7 August 2013 11:10, Luca Cerone <luca....@gmail.com> wrote:
>> I can't try it now, I'll let you know later if it works!
>> (Though just by reading I can't really understand what the code does).
>
> Well,
>
>>> from multiprocessing import Pool
>>> from functools import partial
>>>
>>> class A(object):
>>> def __init__(self,x):
>>> self.value = x
>>> def fun(self,x):
>>> return self.value**x
>
> This is all the same, as with
>
>>> l = range(100)
>>> p = Pool(4)
>
> You then wanted to do:
>
>> op = p.map(A(3).fun, l)
>
> but bound methods can't be pickled, it seems.
>
> However, A(3) *can* be pickled. So what we want is a function:
>
> def proxy(arg):
> A(3).fun(arg)
>
> so we can write:
>
>> op = p.map(proxy, l)
>
> To generalise you might be tempted to write:
>
> def generic_proxy(instance, name):
> def proxy(arg):
> # Equiv. of instance.name(arg)
> getattr(instance, name)(arg)
>
> but the inner function won't work as functions-in-functions can't be
> pickled either.
>
> So we use:
>
>>> def _getattr_proxy_partialable(instance, name, arg):
>>> return getattr(instance, name)(arg)
>
> Which takes all instance, name and arg. Of course we only want our
> function to take arg, so we partial it:
>
>>> def getattr_proxy(instance, name):
>>> """
>>> A version of getattr that returns a proxy function that can
>>> be pickled. Only function calls will work on the proxy.
>>> """
>>> return partial(_getattr_proxy_partialable, instance, name)
>
> partial objects are picklable, btw.
>
>>> op = p.map(getattr_proxy(A(3), "fun"), l)
>>> print(op)
>
> :)


There is also the copy_reg module. Adapting

<http://mail.python.org/pipermail/python-list/2008-July/469164.html>

you get:

import copy_reg
import multiprocessing
import new

def make_instancemethod(inst, methodname):
return getattr(inst, methodname)

def pickle_instancemethod(method):
return make_instancemethod, (method.im_self, method.im_func.__name__)

copy_reg.pickle(
new.instancemethod, pickle_instancemethod, make_instancemethod)

class A(object):
def __init__(self, a):
self.a = a
def fun(self, b):
return self.a**b

if __name__ == "__main__":
items = range(10)
pool = multiprocessing.Pool(4)
print pool.map(A(3).fun, items)


Joshua Landau

unread,
Aug 7, 2013, 11:52:45 AM8/7/13
to Peter Otten, python-list
On 7 August 2013 15:46, Peter Otten <__pet...@web.de> wrote:
> import copy_reg
> import multiprocessing
> import new

"new" is deprecated from 2.6+; use types.MethodType instead of
new.instancemethod.

> def make_instancemethod(inst, methodname):
> return getattr(inst, methodname)

This is just getattr -- you can replace the two uses of
make_instancemethod with getattr and delete this ;).

> def pickle_instancemethod(method):
> return make_instancemethod, (method.im_self, method.im_func.__name__)
>
> copy_reg.pickle(
> new.instancemethod, pickle_instancemethod, make_instancemethod)
>
> class A(object):
> def __init__(self, a):
> self.a = a
> def fun(self, b):
> return self.a**b
>
> if __name__ == "__main__":
> items = range(10)
> pool = multiprocessing.Pool(4)
> print pool.map(A(3).fun, items)

Well that was easy. The Stackoverflow link made that look *hard*. -1
to my hack, +1 to this.

You can do this in one statement:

copy_reg.pickle(
types.MethodType,
lambda method: (getattr, (method.im_self, method.im_func.__name__)),
getattr
)

Peter Otten

unread,
Aug 7, 2013, 12:15:14 PM8/7/13
to pytho...@python.org
Joshua Landau wrote:

> On 7 August 2013 15:46, Peter Otten <__pet...@web.de> wrote:

>> def make_instancemethod(inst, methodname):
>> return getattr(inst, methodname)
>
> This is just getattr -- you can replace the two uses of
> make_instancemethod with getattr and delete this ;).

D'oh ;)

Luca Cerone

unread,
Aug 7, 2013, 6:26:37 PM8/7/13
to
Thanks for the post.
I actually don't know exactly what can and can't be pickles..
not what partialing a function means..
Maybe can you link me to some resources?

I still can't understand all the details in your code :)

Joshua Landau

unread,
Aug 7, 2013, 6:49:47 PM8/7/13
to Luca Cerone, python-list
On 7 August 2013 23:26, Luca Cerone <luca....@gmail.com> wrote:
> Thanks for the post.
> I actually don't know exactly what can and can't be pickles..

I just try it and see what works ;).

The general idea is that if it is module-level it can be pickled and
if it is defined inside of something else it cannot. It depends
though.

> not what partialing a function means..

"partial" takes a function and returns it with arguments "filled in":

from functools import partial

def add(a, b):
return a + b

add5 = partial(add, 5)

print(add5(10)) # Returns 15 == 5 + 10

> Maybe can you link me to some resources?

http://docs.python.org/2/library/functools.html#functools.partial


> I still can't understand all the details in your code :)

Never mind that, though, as Peter Otten's code (with my very minor
suggested modifications) if by far the cleanest method of the two and
is arguably more correct too.

Luca Cerone

unread,
Aug 7, 2013, 7:31:54 PM8/7/13
to
Thanks for the help Peter!
0 new messages