Monkey-patching pycurl

790 views
Skip to first unread message

Alvaro

unread,
May 19, 2011, 7:11:42 AM5/19/11
to gevent: coroutine-based Python network library

Hi there,

I was trying to use urlgrabber for downloading huge files (in the
order of terabytes) from remote machines. Urlgrabber has some nice
features like client-side throttling, byte-ranges, etc, but it uses
pycurl and I have not found an easy way for integrating pycurl with
gevent. Do you know if there is any way I could monkey-patch pycurl so
I could use it with gevent?


Thanks in advance


Alvaro

CryptWizard

unread,
May 19, 2011, 7:21:42 AM5/19/11
to gev...@googlegroups.com
If gevent's built in monkey patching doesn't work, then pycurl
probably directly uses the C socket API.
You are better off using your own native Python socket code for the
huge file downloads.

Daniel Truemper

unread,
May 19, 2011, 7:29:00 AM5/19/11
to gev...@googlegroups.com
PyCurl uses libcurl and the C socket API directly. You could also use
Tornado's AsynchHttpClient:

https://github.com/facebook/tornado/blob/master/tornado/httpclient.py

Best
Daniel

Denis Bilenko

unread,
May 19, 2011, 9:00:58 AM5/19/11
to gev...@googlegroups.com
On Thu, May 19, 2011 at 6:29 PM, Daniel Truemper
<true...@googlemail.com> wrote:
> PyCurl uses libcurl and the C socket API directly. You could also use
> Tornado's AsynchHttpClient:
>
> https://github.com/facebook/tornado/blob/master/tornado/httpclient.py

Did you mean https://github.com/facebook/tornado/blob/master/tornado/curl_httpclient.py
?

This shows how to make pycurl use Tornado's event loop. It should not be
hard to modify it to run on gevent's loop instead, see
https://github.com/wil/gtornado

Next step is to wrap the resulting async API with sync "green" API,
which is also not hard at all.

Then you can monkey patch whatever library uses pycurl with "green" pycurl.

Alvaro Saurin

unread,
May 19, 2011, 9:33:46 AM5/19/11
to gev...@googlegroups.com
Curl has a mechanism for using an external event loop (using the 'multi' interface), so maybe it could be integrated with gevent. This is an example that integrates libcurl with libev:


The 'multi' interface is exported by PyCurl (http://pycurl.sourceforge.net/doc/curlmultiobject.html), but I don't know if it could be mixed with gevent in such a way that any use of pycurl could by transparently asynchronous...


Cheers



Alvaro




Denis Bilenko

unread,
May 27, 2011, 5:23:53 PM5/27/11
to gev...@googlegroups.com
On Thu, May 19, 2011 at 8:33 PM, Alvaro Saurin <alvaro...@gmail.com> wrote:
> Curl has a mechanism for using an external event loop (using the 'multi'
> interface), so maybe it could be integrated with gevent. This is an example
> that integrates libcurl with libev:
> http://curl.haxx.se/libcurl/c/evhiperfifo.html
> The 'multi' interface is exported by PyCurl
> (http://pycurl.sourceforge.net/doc/curlmultiobject.html), but I don't know
> if it could be mixed with gevent in such a way that any use of pycurl could
> by transparently asynchronous...

I did it here: https://bitbucket.org/denis/gevent-curl/
geventcurl.Curl() object wraps pycurl.Curl, however its perform()
method actually uses a CurlMulti instance.

Here's an example:
https://bitbucket.org/denis/gevent-curl/src/d9aeccd324b8/example.py

Unfortunately, my version of pycurl (7.19.0-3build1), which appears to
be the latest, seems to leak Python references (sys.gettotalrefcount()
is constantly increasing) when CurlMulti interface is used.

So if you are going to use pycurl, does not matter with Tornado or
Gevent, you should look into fixing that first.

Reply all
Reply to author
Forward
0 new messages