PyEnchant and multiprocessing

80 views
Skip to first unread message

Barthelemy

unread,
Jun 29, 2011, 11:37:55 AM6/29/11
to pyenchant users
Hi,

I would like to know if somebody encountered any issue while using
PyEnchant with the python multiprocessing library or if it's even
possible that such issue might arise.

I ask because I recently hunted down an intermittent bug in my code
that seemed to come from PyEnchant.

This statement "d = enchant.Dict('en-US')" sometimes blocked (i.e.,
never returned) in one process. I was using
multiprocessing.Pool.imap_unordered.

It is probably my code that caused this issue, but by executing this
statement (and the corresponding work) in the main process instead of
in the pool, I no longer see this intermittent blocking.

Any idea what might have caused this?

Thanks,
Barthélémy

Ryan Kelly

unread,
Jun 29, 2011, 7:17:10 PM6/29/11
to pyencha...@googlegroups.com
On Wed, 2011-06-29 at 08:37 -0700, Barthelemy wrote:
>
> I would like to know if somebody encountered any issue while using
> PyEnchant with the python multiprocessing library or if it's even
> possible that such issue might arise.

I've never encountered one personally, but I'm sure it's possible. I
don't do much multiprocessing stuff these days.

> I ask because I recently hunted down an intermittent bug in my code
> that seemed to come from PyEnchant.
>
> This statement "d = enchant.Dict('en-US')" sometimes blocked (i.e.,
> never returned) in one process. I was using
> multiprocessing.Pool.imap_unordered.
>
> It is probably my code that caused this issue, but by executing this
> statement (and the corresponding work) in the main process instead of
> in the pool, I no longer see this intermittent blocking.
>
> Any idea what might have caused this?

First, what platform are you running on and what version of python are
you using? I can think of a couple of reasons this might occur on a
unixy system (where multiprocessing uses os.fork) but would be at a
complete loss if you're on a windows system (where multiprocessing uses
popen+pickle).

Are you mixing threads and processes at all?

Is it possible to create the dictionary before forking any subprocesses,
and does this help? In other words, instead of this:


def whatever_function(words):


d = enchant.Dict("en-US")

for w in words:
assert d.check(w)


Do it like this:

d = enchant.Dict("en-US")

def whatever_function(words):
for w in words:
assert d.check(w)


I'll put my thinking cap on and let you know if I can up with any more
suggestions.


Ryan


--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ry...@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details

signature.asc

Ryan Kelly

unread,
Jun 30, 2011, 6:45:07 PM6/30/11
to pyencha...@googlegroups.com

Barthelemy

unread,
Jul 1, 2011, 6:51:21 AM7/1/11
to pyenchant users
Hi,

I wrote some basic code that reproduces the error:

https://gist.github.com/1058294

app.py shows two versions of the code that eventually block with
multiprocessing.

app2.py always finish.

As you hypothesized, putting the dictionary outside the function
called by the multiprocessing pool worked.

I don't see my previous message in this thread, so in case it
vanished, here are the info you requested:

> First, what platform are you running on and what version of python are
> you using? I can think of a couple of reasons this might occur on a
> unixy system (where multiprocessing uses os.fork) but would be at a
> complete loss if you're on a windows system (where multiprocessing uses
> popen+pickle).

I'm using linux 64 bit, python 2.7.2

> Are you mixing threads and processes at all?

Nope, only multiprocessing.

Thanks for your help!
Barthélémy

Ryan Kelly

unread,
Jul 1, 2011, 10:57:06 PM7/1/11
to pyencha...@googlegroups.com, Barthelemy
On Fri, 2011-07-01 at 03:51 -0700, Barthelemy wrote:
>
> I wrote some basic code that reproduces the error:
>
> https://gist.github.com/1058294
>
> app.py shows two versions of the code that eventually block with
> multiprocessing.
>
> app2.py always finish.

Thanks, I was able to reproduce the problem using your scripts and
eventually came up with a partial fix.

> As you hypothesized, putting the dictionary outside the function
> called by the multiprocessing pool worked.

As it turns out, this worked for a completely different reason than what
I was thinking.

Part of the problem turned out to be caused by pickling a Dict object.
If you did this:

d1 = Dict()
d2 = pickle.loads(pickle.dumps(d))

Then you would end up with two Dict objects sharing a single C-library
pointer. That's bad, because if you do this:

del d1

Then d2 is suddenly has an invalid pointer.

By passing the dict object as part of the arguments to imap_unordered,
you were causing many instances of it to be pickled, sent over the wire,
and reconstructed in the child process.

I have customized the pickling process so that each unpickled copy
receives a fresh C-library pointer, and this seems to have helped a lot.

Attached is a preview of pyenchant 1.6.6, can you please try it out and
let me know if it helps.

Unfortunately, I am still have to produce deadlocks with your
"will_block" example, but not with the "will_block2" example. They
happen during the call to enchant_dict_describe.

Here's the kicker though: it only deadlocks under python2.7. Running
the same program under python2.6 works just fine.

Moreover, even if I recompile enchant with a dummy enchant_dict_describe
function that does absolutely nothing, I get the same result - a
deadlock on python2.7, working fine on python2.6.

So either something has changed in the ctypes API in python2.7, or this
is a newly-introduced bug in ctypes. I am still trying to work out the
details.

Cheers,

Ryan


> I don't see my previous message in this thread, so in case it
> vanished, here are the info you requested:
>
> > First, what platform are you running on and what version of python are
> > you using? I can think of a couple of reasons this might occur on a
> > unixy system (where multiprocessing uses os.fork) but would be at a
> > complete loss if you're on a windows system (where multiprocessing uses
> > popen+pickle).
>
> I'm using linux 64 bit, python 2.7.2
>
> > Are you mixing threads and processes at all?
>
> Nope, only multiprocessing.
>
> Thanks for your help!
> Barthélémy
>

--

pyenchant-1.6.6.tar.gz
signature.asc

Ryan Kelly

unread,
Jul 1, 2011, 11:07:35 PM7/1/11
to pyencha...@googlegroups.com, Barthelemy
On Sat, 2011-07-02 at 12:57 +1000, Ryan Kelly wrote:
> On Fri, 2011-07-01 at 03:51 -0700, Barthelemy wrote:
> >
> > I wrote some basic code that reproduces the error:
> >
> > https://gist.github.com/1058294
> >
> > app.py shows two versions of the code that eventually block with
> > multiprocessing.
> >
> > app2.py always finish.
>
>
> Unfortunately, I am still have to produce deadlocks with your
> "will_block" example, but not with the "will_block2" example. They
> happen during the call to enchant_dict_describe.
>
> Here's the kicker though: it only deadlocks under python2.7. Running
> the same program under python2.6 works just fine.
>
> Moreover, even if I recompile enchant with a dummy enchant_dict_describe
> function that does absolutely nothing, I get the same result - a
> deadlock on python2.7, working fine on python2.6.
>
> So either something has changed in the ctypes API in python2.7, or this
> is a newly-introduced bug in ctypes. I am still trying to work out the
> details.


I've just confirmed that the "will_block" script also runs fine on
python3.2, so it appears that only python2.7 has this problem.

Are you able to try it on a couple of different versions of python and
see if you get similar results?


Ryan

signature.asc

Barthelemy Dagenais

unread,
Jul 2, 2011, 6:37:25 AM7/2/11
to Ryan Kelly, pyencha...@googlegroups.com
Hi Ryan,

Thank you for looking into this. The pointer problem looks very similar
to another problem I found recently while using django and
multiprocessing (you end up sharing the same socket across multiple
processes, which does not work... you have to explicitly tell django to
close the db and cache sockets so a fresh one is obtained by each process).

Here are the results of my tests with various python versions and your
new fix. 2.6/3.1 and 2.7/3.2 seems synchronized, which makes sense.

Python 2.6.6

pyenchant 1.6.5

will_block: success
will_block2: deadlock (really quickly)

pyenchant 1.6.6

will_block: success
will_block2: success


Python 2.7.2

pyenchant 1.6.5

will_block: deadlock
will_block2: deadlock

pyenchant 1.6.6

will_block: deadlock
will_block2: success


Python 3.1.3

pyenchant 1.6.5

will_block: success
will_block2: deadlock

pyenchant 1.6.6

will_block: success
will_block2: success


Python 3.2

pyenchant 1.6.5

will_block: deadlock
will_block2: deadlock

pyenchant 1.6.6

will_block: deadlock
will_block2: success

Three small notes:

1. I modified the gist to make it works with Python3
(https://gist.github.com/1058294)

2. setup.py of pyenchant 1.6.6. did not work with Python 3. The problem
was at line 194 (Except EnvironmentError, e). I fixed the file locally.

3. As a side note, do you know tox? A user of Py4J introduced me to this
nice library that automatically runs your test suite against multiple
versions of python. It's been very helpful!

Barthélémy

Ryan Kelly

unread,
Jul 2, 2011, 6:48:29 AM7/2/11
to Barthelemy Dagenais, pyencha...@googlegroups.com
On Sat, 2011-07-02 at 06:37 -0400, Barthelemy Dagenais wrote:
> Hi Ryan,
>
> Thank you for looking into this. The pointer problem looks very similar
> to another problem I found recently while using django and
> multiprocessing (you end up sharing the same socket across multiple
> processes, which does not work... you have to explicitly tell django to
> close the db and cache sockets so a fresh one is obtained by each process).
>
> Here are the results of my tests with various python versions and your
> new fix. 2.6/3.1 and 2.7/3.2 seems synchronized, which makes sense.

Ah, OK. I had not reproduced the deadlock on 3.2 but I will keep
trying. That does make more sense than a change specific to 2.7.

> 2. setup.py of pyenchant 1.6.6. did not work with Python 3. The problem
> was at line 194 (Except EnvironmentError, e). I fixed the file locally.

Good catch, thanks. Fixed in trunk.

> 3. As a side note, do you know tox? A user of Py4J introduced me to this
> nice library that automatically runs your test suite against multiple
> versions of python. It's been very helpful!

Yes, it's an awesome tool. I have been slowly converting my projects
over to it but haven't gotten around to PyEnchant yet.


Cheers,

Ryan

--

signature.asc
Reply all
Reply to author
Forward
0 new messages