Re: List of popular Python libraries with known gevent incompatibilities

1,698 views
Skip to first unread message

Tarek Ziadé

unread,
Jul 23, 2012, 8:09:16 AM7/23/12
to gev...@googlegroups.com
Hi André,

I have never seen such a page, but I would love to have it around ! If you want to start it, I can provide a few entries myself (incompatible/replacement) Cheers Tarek

andre.l.caron

unread,
Jul 23, 2012, 8:25:55 AM7/23/12
to gev...@googlegroups.com
Hi Tarek,

Maybe here would be a good place to start listing them.  If we get enough, I might be able to set up a page (on the gevent docs, maybe?).  More generally, is there a list of criteria somewhere that explains what to watch out for?  For example, direct socket access in a PYD file is rather obvious, but anything that performs a lengthy CPU-bound operation can make be problematic.

For the moment, I have (not forcibly tested to demonstrate incompatibilities):
  1. pySerial, at least on Windows (uses "ReadFile()" directly).
  2. MySQLdb (wraps MySQL C API)
  3. dateutil (reads from the Windows registry)
  4. PycURL (wraps libcurl C API)

Tarek Ziadé

unread,
Jul 23, 2012, 8:43:17 AM7/23/12
to gev...@googlegroups.com


On Monday, July 23, 2012 2:25:55 PM UTC+2, andre.l.caron wrote:
Hi Tarek,

Maybe here would be a good place to start listing them.  If we get enough, I might be able to set up a page (on the gevent docs, maybe?).  

That's an invaluable information for people wanting to use gevent in their apps. I think it's worth a page yeah

 
 
More generally, is there a list of criteria somewhere that explains what to watch out for?  For example, direct socket access in a PYD file is rather obvious, but anything that performs a lengthy CPU-bound operation can make be problematic.

For the moment, I have (not forcibly tested to demonstrate incompatibilities):
  1. pySerial, at least on Windows (uses "ReadFile()" directly).
  2. MySQLdb (wraps MySQL C API)
=> use a pure Python driver like pymysql
 
  1. dateutil (reads from the Windows registry)
  2. PycURL (wraps libcurl C API)

- pyzmq, use gevent_zmq until the current master of pyzmq is released with the new pyzmq.green
- python-memcached  (I have switched to pylibmc wich still blocks afaik but is fast enough ;) -- a good candidate is ultramemcached but I have not tried it
- I *think* py2app has issues.


 

Matthias Urlichs

unread,
Jul 23, 2012, 9:43:48 AM7/23/12
to gev...@googlegroups.com
Hi,

andre.l.caron:
> For the moment, I have (not forcibly tested to demonstrate
> incompatibilities):
>
Workarounds welcome.

> 2. MySQLdb <http://sourceforge.net/projects/mysql-python/> (wraps MySQL
> C API)

The 'ultramysql' driver works well with gevent but doesn't implement the
Python DBI.

> 3. dateutil <http://labix.org/python-dateutil/> (reads from the Windows
> registry)

Workaround: use Linux. :-P


5. Basically anything that accesses the file system. NFS requests may
block indefinitely. OS threads aren't bothered by this; gevent is. :-(

--
-- Matthias Urlichs

Aleksandar Kordic

unread,
Jul 23, 2012, 10:12:52 AM7/23/12
to gev...@googlegroups.com
On Mon, Jul 23, 2012 at 3:43 PM, Matthias Urlichs <matt...@urlichs.de> wrote:
Hi,

andre.l.caron:
> For the moment, I have (not forcibly tested to demonstrate
> incompatibilities):
>
Workarounds welcome.

>    2. MySQLdb <http://sourceforge.net/projects/mysql-python/> (wraps MySQL
>    C API)

The 'ultramysql' driver works well with gevent but doesn't implement the
Python DBI.

https://github.com/hongqn/umysqldb works nice as Python DBI on top of ultramysql.
 

>    3. dateutil <http://labix.org/python-dateutil/> (reads from the Windows
>    registry)

Workaround: use Linux. :-P


5.  Basically anything that accesses the file system. NFS requests may
block indefinitely. OS threads aren't bothered by this; gevent is. :-(

--
-- Matthias Urlichs



--
Best Regards,
Alex

andre.l.caron

unread,
Jul 23, 2012, 10:42:35 AM7/23/12
to gev...@googlegroups.com
The 'ultramysql' driver works well with gevent but doesn't implement the
Python DBI.

Thanks, I hadn't seen this.  However, just like the ultramemcached, this performs I/O in C++-land.  I have trouble figuring out why you recommend this for use in gevent.  Plus, we're already considering pymysql as an alternative to mysqldb.
 
Workaround: use Linux. :-P

Already applied this workaround, ran into bug "Customer won't switch to Linux" :-)
 
5.  Basically anything that accesses the file system. NFS requests may
block indefinitely. OS threads aren't bothered by this; gevent is. :-(

Do you mean "anything that accesses the file system while bypassing gevent" or is this a problem even after monkey patching?

André 

andre.l.caron

unread,
Jul 23, 2012, 10:46:30 AM7/23/12
to gev...@googlegroups.com
Hi Tarek,
  
- python-memcached  (I have switched to pylibmc wich still blocks afaik but is fast enough ;) -- a good candidate is ultramemcached but I have not tried it

That's odd.  Of all memcached clients for Python, this seems to be the only one that's written entirely in Python.  Pylibmc wraps libmemcached and ultramemcached performs I/O in C++-land.  What problems did you encounter with python-memcached in a gevent application?

Regards
André

Markus Thurlin

unread,
Jul 23, 2012, 10:55:44 AM7/23/12
to gev...@googlegroups.com
ultramemcache and ultramysql are both written specifically for gevent

Aleksandar Kordic

unread,
Jul 23, 2012, 11:01:19 AM7/23/12
to gev...@googlegroups.com
On Mon, Jul 23, 2012 at 4:42 PM, andre.l.caron <andre....@gmail.com> wrote:
The 'ultramysql' driver works well with gevent but doesn't implement the
Python DBI.

Thanks, I hadn't seen this.  However, just like the ultramemcached, this performs I/O in C++-land.  I have trouble figuring out why you recommend this for use in gevent.  Plus, we're already considering pymysql as an alternative to mysqldb.

 If you look at https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c line 100 you can see that library is using Python C-API to import and later use socket module, so you need to monkey-patch socket module for gevent usage.

andre.l.caron

unread,
Jul 23, 2012, 11:02:53 AM7/23/12
to gev...@googlegroups.com

On Monday, July 23, 2012 10:55:44 AM UTC-4, Markus Thurlin wrote:
ultramemcache and ultramysql are both written specifically for gevent

AH!  I just took another look and found io_cpython.c which uses Python's I/O facilities!  Now I get it :-)

Thanks!
André 

Matthias Urlichs

unread,
Jul 23, 2012, 3:51:06 PM7/23/12
to gev...@googlegroups.com
Hi,

andre.l.caron:
> > 5. Basically anything that accesses the file system. NFS requests may
> > block indefinitely. OS threads aren't bothered by this; gevent is. :-(
>
> Do you mean "anything that accesses the file system while bypassing gevent"
> or is this a problem even after monkey patching?
>
Umm, gevent does NOT monkeypatch stuff like file.open or file.read or
file.flush or os.link or os.unlink or os.path.abspath or … all of which
access the file system, which might block arbitrarily long.

The best workaround for this that I've found (so far) is to use RPyC,
which has a legacy mode that basically exports a complete remote
namespace. Create a socketpair, fork, run a server in the child and
use its file() and os.* instead of Python's standard file class
and os module. It's reasonably transparent; the only problem is that
you need to modify any library code yourself – you can't monkeypatch
file().

--
-- Matthias Urlichs

andre.l.caron

unread,
Jul 24, 2012, 8:52:49 AM7/24/12
to gev...@googlegroups.com


On Monday, July 23, 2012 3:51:06 PM UTC-4, smurf wrote:
Umm, gevent does NOT monkeypatch stuff like file.open or file.read or
file.flush or os.link or os.unlink or os.path.abspath or … all of which
access the file system, which might block arbitrarily long.

The best workaround for this that I've found (so far) is to use RPyC,
which has a legacy mode that basically exports a complete remote
namespace. Create a socketpair, fork, run a server in the child and
use its file() and os.* instead of Python's standard file class
and os module. It's reasonably transparent; the only problem is that
you need to modify any library code yourself – you can't monkeypatch
file().

Wouldn't it be easier to write a gevent.fs package that uses AsyncResult and run the FS operations in a real background thread (pool)?  Note that I'm not suggesting that the thread pool be exposed in gevent's public API; I can certainly foresee the abuse it would get.  However, provided a little bit of magic, gevent could offer in-process FS operations that don't block all greenlets.  (This is getting off topic, though.  If the idea is interesting, we'd be better to continue this topic in another thread).

Anyways, it might not be a deal killer (we don't do much file I/O), but it's certainly good to know that all regular FS operations block all greenlets, regardless of monkey patching.

André

Matthias Urlichs

unread,
Jul 24, 2012, 9:08:26 AM7/24/12
to gev...@googlegroups.com
Hi,

andre.l.caron:
> Wouldn't it be easier to write a gevent.fs package that uses
> AsyncResult > and run the FS operations in a real background
> thread (pool)?

The point is that third-party libraries use these calls, so you need to
monkeypatch __builtins__ with something that behaves like the original,
but still delegates the real work to some other thread behind the scenes.

Yes, a threaded solution might be simpler. OTOH, I don't want gevent to
block because some 3rd-party lilbrary didn't correctly release the GIL
across a call to C, or does some lenghty calculation, or other stuff along
these lines which you usually don't notice…

> Anyways, it might not be a deal killer (we don't do much file I/O), but
> it's certainly good to know that all regular FS operations block all
> greenlets, regardless of monkey patching.
>
True.

--
-- Matthias Urlichs

vitaly

unread,
Jul 25, 2012, 3:33:31 AM7/25/12
to gev...@googlegroups.com
On Monday, July 23, 2012 7:42:35 AM UTC-7, andre.l.caron wrote:
The 'ultramysql' driver works well with gevent but doesn't implement the
Python DBI.

Thanks, I hadn't seen this.  However, just like the ultramemcached, this performs I/O in C++-land.  I have trouble figuring out why you recommend this for use in gevent.  Plus, we're already considering pymysql as an alternative to mysqldb.

I experienced sporadic pymysql failures with the combination of pymysql and DBUtils.PooledDB  on MacOS and Linux when using gevent socket monkey-patch in an app that had two greenlets making SQL queries.  The failures would manifest themselves as None being returned (instead of a list) from cursor.fetchall().  The problem went away after I turned off gevent monkey-patching of socket, and with it went the ability to execute concurrent SQL queries. 

andre.l.caron

unread,
Jul 25, 2012, 11:56:02 AM7/25/12
to gev...@googlegroups.com
On Wednesday, July 25, 2012 3:33:31 AM UTC-4, vitaly wrote:
I experienced sporadic pymysql failures with the combination of pymysql and DBUtils.PooledDB  on MacOS and Linux when using gevent socket monkey-patch in an app that had two greenlets making SQL queries.  The failures would manifest themselves as None being returned (instead of a list) from cursor.fetchall().  The problem went away after I turned off gevent monkey-patching of socket, and with it went the ability to execute concurrent SQL queries. 

Have you come up with a theory on why this failure occured?  I can easily see two greenlets making concurrent queries interleaving reads and writes on the same socket since they share a database connection object.  For example, one greenlet sends a query, then another greenlet sends a query, and then the results are picked up out of order by greenlets (the second greenlet is scheduled first and gets the first query's result).  Remember that any monkey-patched socket I/O operation can result in a context switch.

Does the problem go away if you use a DB connection per greenlet?  I don't recommend this for production, but if it does solve your problem, perhaps it would be a good idea to create a small pool of DB-bound greenlets that performs SQL queries on a greenlet-local connections.  In this scheme, regular greenlets make SQL queries through one of the greenlets in the pool while blocking on an AsyncResult or something.

I'm particularly interested in this use case because I'm still wrapping my head around concurrency issues in gevent-based applications.  It's not like they dissapear just because scheduling is cooperative instead of preemptive.  Sharing resources across greenlet boundaries still requires synchronization, albeit in a different way (context switching occurs at predictable locations).

Regards,
André

Denis Bilenko

unread,
Jul 25, 2012, 11:59:43 AM7/25/12
to gev...@googlegroups.com
See this for an example of how to share database connections across
multiple greenlets in a safe way:
https://bitbucket.org/denis/gevent/src/tip/examples/psycopg2_pool.py#cl-1

vitaly

unread,
Jul 25, 2012, 2:21:30 PM7/25/12
to gevent: coroutine-based Python network library
You can expect to encounter all the same problems with greenlets as
you would with threads, such as deadlocks and other race-conditions.
This is just a natural consequence of context switches, even
cooperative ones. In my gevent-based modules, I code my greenlets as
I would code threads, but use gevent primitives for synchronization
(e.g., a gevent semaphore versus posix mutex); this way, when code
changes are made by me or others that make additional calls to
functions that may result in context switch in places where they
didn't happen before, the code still continues to work properly.

In the failure case that I described, I used the Open-Source
DBUtils.PooledDB to provide/manage a mysql database connection pool,
so that I wouldn't have to implement the same thing myself. It's a
pretty nifty utility that has the capability to automatically
reconnect broken connections, do connection pings, integrates well
with SQL transaction support, etc. I am told that it works pretty
well in true multi-threaded apps.

I spent some time trying to debug the problem, but not enough to nail
it down. However, it made me realize that monkey.patch_all(),
monkey.patch_socket(), etc. can cause undesirable side-effects/
failures in other unsuspecting 3rd party packages that the same app
also needs to use. My theory is that somewhere in DBUtils.PooledDB or
pymysql (my intuition leans towards PooledDB) there is code that
doesn't expect a context switch to take place within a single thread,
and so updates some shared data structures and performs blocking
operations (that may cause a gevent context switch when monkey-
patched) in an order that may result in some sort of data structure
integrity problem during concurrency situations. I tried a couple of
work-arounds in my investigation: 1. create a new db connection for
each SQL transaction (not using PooledDB): this made the failure go
away, preserved concurrency, but had an undesirable impact on
performance. 2. not use gevent monkey-patching of the socket module:
this made the sporadic failure go away, but also disabled concurrency
during the execution of SQL. I settled on work-around #2 as that was
the practical solution for my app at the time.

vitaly

unread,
Jul 25, 2012, 2:28:19 PM7/25/12
to gevent: coroutine-based Python network library
On Jul 25, 8:59 am, Denis Bilenko <denis.bile...@gmail.com> wrote:
> multiple greenlets in a safe way:https://bitbucket.org/denis/gevent/src/tip/examples/psycopg2_pool.py#...

Thank you for the link. My gevent-based app shared a common DAO
module with non-geven-aware apps. That shared module used
DBUtils.PooledDB + pymysql under the covers, and it was not practical
to replace DBUtils.PooledDB with a gevent-friendly version at the time.

Denis Bilenko

unread,
Jul 26, 2012, 4:28:50 AM7/26/12
to gev...@googlegroups.com
On Mon, Jul 23, 2012 at 4:25 PM, andre.l.caron <andre....@gmail.com> wrote:
> dateutil (reads from the Windows registry)

does reading from local Windows registry makes it incompatible? This
is not even network communication.

> PycURL (wraps libcurl C API)

I have made integration of PyCurl's Multi interface and gevent
sometime ago: https://bitbucket.org/denis/gevent-curl/src/d9aeccd324b8/example.py
I've found that pycurl's multi interface at the time leaks references
though,even without gevent.

andre.l.caron

unread,
Jul 26, 2012, 7:46:28 AM7/26/12
to gev...@googlegroups.com
On Thursday, July 26, 2012 4:28:50 AM UTC-4, Denis Bilenko wrote:
On Mon, Jul 23, 2012 at 4:25 PM, andre.l.caron <andre....@gmail.com> wrote:
> dateutil (reads from the Windows registry)

does reading from local Windows registry makes it incompatible? This
is not even network communication.

I'm not sure, that's mostly why I started this thread in the first place.  I came up with a list of suspect packages from all our dependencies.  AFAIK, reading from the Windows registry may hit the disk, so I think it's risky to do so anywhere but at startup.

Besides, as Tarek mentioned earlier, it's may be acceptable for your application to use "blocking" functions if it suits you (e.g. if it doesn't block too long on average).

André

Benoit Chesneau

unread,
Jul 26, 2012, 8:25:36 AM7/26/12
to gev...@googlegroups.com
Other possibility would be binding libeio [1]. It would allows to have
non blocking io on the fs just like libev with the sockets. Also this
is the same author/group.

Or maybe use the library used by nodejs libuv [2] which is wrapping
libev & libeio and also use libares


- benoît
[1] http://software.schmorp.de/pkg/libeio.html
[2] https://github.com/joyent/libuv

Denis Bilenko

unread,
Jul 26, 2012, 9:02:56 AM7/26/12
to gev...@googlegroups.com
On Thu, Jul 26, 2012 at 4:25 PM, Benoit Chesneau <bche...@gmail.com> wrote:
> Other possibility would be binding libeio [1]. It would allows to have
> non blocking io on the fs just like libev with the sockets. Also this
> is the same author/group.

How is it better than using gevent.threadpool?

Benoit Chesneau

unread,
Jul 26, 2012, 10:31:20 AM7/26/12
to gev...@googlegroups.com
for what? it may be better to share the same loop of events instead
of having a thread / file io operations imo

- benoît

Denis Bilenko

unread,
Jul 26, 2012, 11:02:17 AM7/26/12
to gev...@googlegroups.com
libeio uses threadpool for most operations, isn't it?

You already can make any file operation non-blocking:
>>> threadpool.apply(os.unlink, (filename, )) # won't block the event loop

So I don't see a reason for wrapping async unlink from libeio which
would also run unlink() in a [private] thread pool. It could be that a
threadpool implemented in C has less overhead. However, making
wrappers for all those eio functions to be compatible with Python os
module does not seems worth the trouble.

For those who want async OS operations in gevent, it's now really easy
to make them yourself:

def mkdir(*args):
return get_hub().threadpool.apply(os.mkdir, args)

libeio is good for node.js because node.js is built from scratch -
Python already has huge stdlib that does all these things.

Equand

unread,
Jul 26, 2012, 5:01:21 PM7/26/12
to gev...@googlegroups.com

for mysql try gevent-mysql

Benoit Chesneau

unread,
Jul 26, 2012, 5:07:33 PM7/26/12
to gev...@googlegroups.com
On Thu, Jul 26, 2012 at 5:02 PM, Denis Bilenko <denis....@gmail.com> wrote:
> On Thu, Jul 26, 2012 at 6:31 PM, Benoit Chesneau <bche...@gmail.com> wrote:
>> On Thu, Jul 26, 2012 at 3:02 PM, Denis Bilenko <denis....@gmail.com> wrote:
>>> On Thu, Jul 26, 2012 at 4:25 PM, Benoit Chesneau <bche...@gmail.com> wrote:
>>>> Other possibility would be binding libeio [1]. It would allows to have
>>>> non blocking io on the fs just like libev with the sockets. Also this
>>>> is the same author/group.
>>>
>>> How is it better than using gevent.threadpool?
>>
>> for what? it may be better to share the same loop of events instead
>> of having a thread / file io operations imo
>
> libeio uses threadpool for most operations, isn't it?

Indeed. Didn't read it in details.
>
> You already can make any file operation non-blocking:
>>>> threadpool.apply(os.unlink, (filename, )) # won't block the event loop
>
> So I don't see a reason for wrapping async unlink from libeio which
> would also run unlink() in a [private] thread pool. It could be that a
> threadpool implemented in C has less overhead. However, making
> wrappers for all those eio functions to be compatible with Python os
> module does not seems worth the trouble.
>
> For those who want async OS operations in gevent, it's now really easy
> to make them yourself:
>
> def mkdir(*args):
> return get_hub().threadpool.apply(os.mkdir, args)
>
> libeio is good for node.js because node.js is built from scratch -
> Python already has huge stdlib that does all these things.

I guess having a proper api in gevent would help as well :) But that's
already a nice addon to gevent. Hopefully a stable release will happen
soon.

- benoît

André Cruz

unread,
Jul 31, 2012, 3:36:33 PM7/31/12
to gev...@googlegroups.com

On Thursday, July 26, 2012 9:28:50 AM UTC+1, Denis Bilenko wrote:
> PycURL (wraps libcurl C API)

I have made integration of PyCurl's Multi interface and gevent
sometime ago: https://bitbucket.org/denis/gevent-curl/src/d9aeccd324b8/example.py
I've found that pycurl's multi interface at the time leaks references
though,even without gevent.

I'm evaluating the use of PyCurl and, since I already use Gevent, I will also look at the Gevent integration. Can you tell me a quick way of checking if with the current versions of the libs there still exist reference leaks?

Thanks,
André

vitaly

unread,
Sep 16, 2012, 12:52:28 PM9/16/12
to gev...@googlegroups.com
When using pymysql + DBUtils.PooledDB with gevent.monkey.patch_socket(), and making queries from two greenlets, cursor.fetchall() sporadically returns None instead of a sequence.  cursor.fetchall() is supposed to always return a sequence (possibly an empty sequence), but never None.

vitaly

unread,
Sep 16, 2012, 12:55:24 PM9/16/12
to gev...@googlegroups.com
On Sunday, September 16, 2012 9:52:28 AM UTC-7, vitaly wrote:
When using pymysql + DBUtils.PooledDB with gevent.monkey.patch_socket(), and making queries from two greenlets, cursor.fetchall() sporadically returns None instead of a sequence.  cursor.fetchall() is supposed to always return a sequence (possibly an empty sequence), but never None.
Also, same problem when using gevent.monkey.patch_all() instead of gevent.monkey.patch_socket().
Reply all
Reply to author
Forward
0 new messages