Deadlock when using restkit+lxml with gevent

95 views
Skip to first unread message

Michael

unread,
Dec 14, 2011, 4:55:15 AM12/14/11
to gevent: coroutine-based Python network library
Hi guys,

I could need some enlightment here. I'm currently using a combination
of restkit, lxml and gevent for a botlike tool, but when I tried to
group some functionality of the downloading and the processing
together, I got some strange deadlocks, which I strapped down to the
following testcase. wrapper_working does it's job as expected, adding
more functionality from lxml there, gets the deadlock. Any ideas, why
that is going wrong, and how I could get the wrapper_failing to work,
without taking the functionality out of the wrapper, would be greatly
appreciated.

Thanks!
Michael

from gevent import monkey, pool
monkey.patch_all()

import logging
import unittest
import restkit
from lxml import etree

class DSmassDLTest(unittest.TestCase):
parser = etree.HTMLParser()
test_urls = [
"http://yahoo.fr",
"http://google.com",
"http://friendpaste.com",
"http://benoitc.io",
"http://couchdb.apache.org"]

@staticmethod
def wrapper_failing(args):
ret = args[0](*args[1], **args[2])
tree = etree.ElementTree(file=ret.body_stream(),
parser=DSmassDLTest.parser)
logging.info("Got: %s", tree.xpath('//title')[0].text.strip())

@staticmethod
def wrapper_working(args):
return args[0](*args[1], **args[2])

def runTest(self):
wrapper = DSmassDLTest.wrapper_working

funcs = (restkit.request,) * len(self.test_urls)
args = tuple((url,) for url in self.test_urls)
kwargs = (dict(follow_redirect=True),) * len(self.test_urls)
data = zip(funcs, args, kwargs) * 3

gp = pool.Group()
dls = gp.imap(wrapper, data)

if wrapper == DSmassDLTest.wrapper_working:
for dl in dls:
tree = etree.ElementTree(file=dl.body_stream(),
parser=DSmassDLTest.parser)
logging.info("Got: %s", tree.xpath('//title')
[0].text.strip())

if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
suite = unittest.TestSuite()
suite.addTest(DSmassDLTest())
unittest.TextTestRunner(verbosity=2).run(suite)

Michael

unread,
Dec 14, 2011, 5:08:05 AM12/14/11
to gevent: coroutine-based Python network library
What I forgot to mention: I'm using the current stable version 0.13.6
on a linux system.

Benoit Chesneau

unread,
Dec 14, 2011, 5:42:54 AM12/14/11
to gev...@googlegroups.com
On Wed, Dec 14, 2011 at 11:08 AM, Michael <nimro...@googlemail.com> wrote:
> What I forgot to mention: I'm using the current stable version 0.13.6
> on a linux system.

any trace or something? Also apparently you are not using a gevent
restkit pool, which is needed.

- benoit

Michael Löffler

unread,
Dec 14, 2011, 6:15:12 AM12/14/11
to gev...@googlegroups.com
Hi Benoit,

thanks for your answer. It doesn't crash with a trace, the program just
freezes and is not responsive anymore.

So you mean, restkit should get a separate pool? Could you please more
specific, what exactly would have to be changed?

In the non-strapped application I had used GeventManager, but the result
was the same, to I tried to reduce it even more,
as too many connections don't seem to be the issue here.

Stephen Edie

unread,
May 21, 2012, 2:05:17 PM5/21/12
to gev...@googlegroups.com
Hi Michael,

I have a solution.  The lxml parsers are not re-enterant.  See FAQ at http://lxml.de/1.3/FAQ.html#id1.  You need to create unique copies of parsers for each gevent coroutine:

def wrapper_failing(args):
        ret = args[0](*args[1], **args[2])
        tree = etree.ElementTree(file=ret.body_stream(),parser=DSmassDLTest.parser.copy())

        logging.info("Got: %s", tree.xpath('//title')[0].text.strip())


Stephen



On Wednesday, December 14, 2011 4:15:12 AM UTC-7, Michael wrote:
Hi Benoit,

thanks for your answer. It doesn't crash with a trace, the program just
freezes and is not responsive anymore.

So you mean, restkit should get a separate pool? Could you please more
specific, what exactly would have to be changed?

In the non-strapped application I had used GeventManager, but the result
was the same, to I tried to reduce it even more,
as too many connections don't seem to be the issue here.


Am 14.12.2011 18:42, schrieb Benoit Chesneau:

Michael Löffler

unread,
May 22, 2012, 8:35:16 AM5/22/12
to gev...@googlegroups.com
Hi Stephen,

thanks for your reply. I had solved the problem in the meantime by
feeding strings to the html parser instance instead of file handlers.
With this solution, it also works with a single parser instance. As my
html files are rather small, I guess this is the faster solution.

Michael
Reply all
Reply to author
Forward
0 new messages